52 datasets found
  1. Twitter users in the United States 2019-2028

    • statista.com
    • ai-chatbox.pro
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
    Explore at:
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United States
    Description

    The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

  2. Twitter Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data, Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 24, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.

  3. Z

    A study on real graphs of fake news spreading on Twitter

    • data.niaid.nih.gov
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
    Explore at:
    Dataset updated
    Aug 20, 2021
    Dataset authored and provided by
    Amirhosein Bodaghi
    Description

    *** Fake News on Twitter ***

    These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

    1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

    2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

    3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

    4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

    5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

    The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

    DD

    DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

    The structure of excel files for each dataset is as follow:

    Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

    User ID (user who has posted the current tweet/retweet)

    The description sentence in the profile of the user who has published the tweet/retweet

    The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

    Date and time of creation of the account by which the current tweet/retweet has been posted

    Language of the tweet/retweet

    Number of followers

    Number of followings (friends)

    Date and time of posting the current tweet/retweet

    Number of like (favorite) the current tweet had been acquired before crawling it

    Number of times the current tweet had been retweeted before crawling it

    Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

    The source (OS) of device by which the current tweet/retweet was posted

    Tweet/Retweet ID

    Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

    Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

    Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

    Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

    State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

    r : The tweet/retweet is a fake news post

    a : The tweet/retweet is a truth post

    q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

    n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

    DG

    DG for each fake news contains two files:

    A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

    A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

    Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

    The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.

  4. Master X-Ray Catalog - Dataset - NASA Open Data Portal

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Master X-Ray Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/master-x-ray-catalog
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .

  5. Annual Respondents Database X, 1997-2020: Secure Access

    • beta.ukdataservice.ac.uk
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office For National Statistics (ONS) (2024). Annual Respondents Database X, 1997-2020: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-7989-5
    Explore at:
    Dataset updated
    2024
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    DataCitehttps://www.datacite.org/
    Authors
    Office For National Statistics (ONS)
    Description

    The Annual Respondents Database X (ARDx) has been created to allow users of Annual Respondents Database (ARD) (held at the UK Data Archive under SN 6644) to continue analysis even though the Annual Business Inquiry (ABI) which was used to create ARD ceased in 2008. ARDx contains harmonised variables from 1997 to 2020.

    ARDx is created from two ONS surveys, the Annual Business Inquiry (ABI; 1998-2008, held at the UK Data Archive under SN 6644) and the Annual Business Survey (ABS; 2009 onwards, held at the UK Data Archive under SN 7451). The ABI has an employment survey (ABI1) and a second survey for financial information (ABI2). ABS only collects financial data, and so is supplemented with employment data from the Business Register and Employment Survey (BRES; 2009 onwards, held at the UK Data Archive under SN 7463).

    ARDx consists of six types of files: 'respondent files' which have reported and derived information from survey questionnaire responses; and 'universe files' which contain limited information on all business that are within scope of the ABI/ABS. These files are provided at both the Reporting Unit and Local Unit levels. There are also 'register panel' and "capital stock" files.

    Linking to other business studies
    These data contain Inter-Departmental Business Register (IDBR) reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.

    For the fifth edition (December 2023), ARDx Version 4.0 for 1997-2020 has been provided, replacing Version 3. Coverage has thus been expanded to include 1997 and 2015-2020.

    Note to users
    Due to the limited nature of the documentation available for ARDx, users are advised to consult the documentation for the Annual Business Survey (UK Data Archive SN 7451) for detailed information about the data.

    For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.

  6. c

    Master X-Ray Catalog

    • s.cnmilf.com
    • catalog.data.gov
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    High Energy Astrophysics Science Archive Research Center (2025). Master X-Ray Catalog [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/master-x-ray-catalog
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    High Energy Astrophysics Science Archive Research Center
    Description

    The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .

  7. Russian Foreign Ministry Twitter Accounts Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Sep 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Shultz; Benjamin Shultz; E Rosalie Li; E Rosalie Li (2024). Russian Foreign Ministry Twitter Accounts Dataset [Dataset]. http://doi.org/10.5281/zenodo.11489527
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin Shultz; Benjamin Shultz; E Rosalie Li; E Rosalie Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 5, 2024
    Area covered
    Russia
    Description

    This publication introduces a novel dataset of 403 diplomatic X/Twitter accounts belonging to the Russian government (primarily the Russian Foreign Ministry) and accompanying metadata. These accounts have become a known vector in the spread of false and misleading information around the Russian invasion of Ukraine, however, given new restrictions on the accessibility of the X/Twitter API and visibility of users' following lists, the vast majority of these accounts are no longer easily discoverable by researchers. The primary aim behind the publication of this dataset is to provide a comprehensive resource for further analysis of this disinformation vector.

  8. Dataset for: A technique for in-situ displacement and strain measurement...

    • catalog.data.gov
    • data.nist.gov
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2025). Dataset for: A technique for in-situ displacement and strain measurement with laboratory-scale X-ray Computed Tomography [Dataset]. https://catalog.data.gov/dataset/dataset-for-a-technique-for-in-situ-displacement-and-strain-measurement-with-laboratory-sc-f123a
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    This record provides raw and post-processed data used in the associated paper "A technique for in-situ displacement and strain measurement with laboratory-scale X-ray Computed Tomography." The codes used are provided in a separate software publication "SerialTrackXR", also referenced. The data consist of 3D X-ray computed tomography (X-ray CT) scans, projection images, and load/displacement data of two additively manufactured tensile test coupons made from IN718 with different processing conditions. The 3D images were collected in-situ at progressively increasing levels of applied displacement. The projection images track this displacement both total and as surface maps. Load/displacement data from the load frame used to apply displacement are also provided. A displacement tracking validation dataset, consisting of known rigid body displacements imposed on a nominally un-deformed third test specimen is also included.The X-ray CT data are rather large, each .tiff stack being about 2 GB; the displacement and strain map files are also >1 GB. Other data are relatively smaller. The dataset consists of 170 files, totaling 69.9 GB (as noted above, much of this space is in 3D images and raw data for the 3D images - most users will not need to interact with those raw data).

  9. Medical Imaging (CT-Xray) Colorization New Dataset

    • kaggle.com
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuvo Kumar Basak-4004.o (2025). Medical Imaging (CT-Xray) Colorization New Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11072909
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shuvo Kumar Basak-4004.o
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Medical Imaging (CT-Xray) Colorization New Dataset šŸ©ŗšŸ’»šŸ–¼ļø This dataset provides a collection of medical imaging data, including both CT (Computed Tomography) and X-ray images, with an added focus on colorization techniques. The goal of this dataset is to facilitate the enhancement of diagnostic processes by applying various colorization techniques to grayscale medical images, allowing researchers and machine learning models to explore the effects of color in radiology.

    Key Features: CT and X-ray Images šŸ„: Contains both CT scans and X-ray images, widely used in medical diagnostics. Colorized Medical Images 🌈: Each image has been colorized using advanced methods to improve visual interpretation and analysis, including details that might not be immediately obvious in grayscale images. New Dataset šŸ“Š: This dataset is newly created to provide high-quality colorized medical imaging, ideal for training AI models in medical image analysis and enhancing diagnostic accuracy. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15408835%2F4bfb7257cf09b0a118808b289c6c3ed4%2Fmotion_image.gif?generation=1742292037458801&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15408835%2F20c64287d3b580a36bf8f948f82dbb6b%2Fmotion_image2.gif?generation=1742292060396551&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15408835%2Fdb91cac64f5a6a9100ac117fc8a55ee5%2Fmotion_image4.gif?generation=1742292150147491&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15408835%2F8624a8cab05645e3a5f02a2c1e3e9e3f%2Fmotion_image3.gif?generation=1742292165846162&alt=media" alt="">

    Methods Used for Colorization: Basic Color Map Application šŸŽØ: Applying standard color maps to highlight structures in CT and X-ray images. Adaptive Histogram Equalization (CLAHE) šŸ”: Adaptive enhancement to improve contrast and highlight important features, especially in medical contexts. Contrast Stretching šŸ“ˆ: Adjusting image intensity to enhance visual details and improve diagnostic quality. Gaussian Blur šŸŒ€: Applied to reduce noise, offering a smoother image for better processing. Edge Detection (Canny) ✨: Detecting edges and contours, useful for identifying specific features in medical scans. Random Color Palettes šŸŽØ: Using randomized color schemes for unique visual representations. Gamma Correction 🌟: Adjusting image brightness to reveal more information hidden in the shadows. LUT (Lookup Table) Color Mapping šŸ’”: Applying predefined color lookups for visually appealing representations. Alpha Blending šŸ”¶: Blending colorized regions based on certain thresholds to highlight structures or anomalies. 3D Rendering šŸ”ŗ: For creating 3D-like visualizations from 2D scans. Heatmap Visualization šŸ”„: Highlighting areas of interest, such as anomalies or tumors, using heatmap color gradients. Interactive Segmentation šŸ–±ļø: Interactive visualizations that help in segmenting regions of interest in medical images. Applications šŸ„šŸ’” This dataset has numerous applications, particularly in the field of medical image analysis, AI development, and diagnostic improvement. Some of the major applications include:

    Medical Diagnostics Enhancement šŸ”:

    Colorization can aid radiologists in interpreting CT and X-ray images by making abnormalities more visible. Helps in visualizing tumors, fractures, or other anomalies, especially in cases where grayscale images are hard to interpret. AI and Machine Learning for Healthcare šŸ¤–:

    Used for training deep learning models in image segmentation, detection, and classification of diseases (e.g., cancer detection). AI models can be trained on these colorized images to improve accuracy in diagnostic tools, leading to early disease detection. Medical Image Enhancement šŸ–¼ļø:

    Enables improved contrast, better detail visibility, and highlighting of specific anatomical regions using color. Colorization may improve the accuracy of radiological assessments by allowing professionals to more easily spot abnormalities and changes over time. Data Augmentation for Model Training šŸ“š:

    The colorized images can serve as an additional data source for training AI models, increasing model robustness through synthetic data generation. Various colorization methods (like heatmaps and random palettes) can be used to augment image variations, improving model performance under different conditions. Visualizing Anomalies for Anomaly Detection šŸ”„:

    Heatmap visualization helps detect subtle and hidden anomalies by coloring the areas of interest with intensity, enabling faster identification of potential issues. Edge detection and segmentation techniques enhance the ability to detect the edges and boundaries of tumors, fractures, and other critical features. 3D Image Rendering for Detailed Analysis 🧠:

    3D rend...

  10. MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

    • zenodo.org
    Updated May 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane [Dataset]. http://doi.org/10.5281/zenodo.15401479
    Explore at:
    Dataset updated
    May 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

    We present a Multiplatform Annotated Dataset for Societal Impact of Hurricane (MASH) that includes 98,662 relevant social media data posts from Reddit, X, TikTok, and YouTube.
    In addition, all relevant posts are annotated on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes in a multi-modal approach that considers both textual and visual content (text, images, and videos), providing a rich labeled dataset for in-depth analysis.
    The dataset is also complemented by an Online Analytics Platform (https://hurricane.web.illinois.edu/) that not only allows users to view hurricane-related posts and articles, but also explores high-frequency keywords, user sentiment, and the locations where posts were made.
    To our best knowledge, MASH is the first large-scale, multi-platform, multimodal, and multi-dimensionally annotated hurricane dataset. We envision that MASH can contribute to the study of hurricanes' impact on society, such as disaster severity classification, event detections, public sentiment analysis, and bias identification.

    Usage Notice

    This dataset includes four annotation files:
    • reddit_anno_publish.csv
    • tiktok_anno_publish.csv
    • twitter_anno_publish.csv
    • youtube_anno_publish.csv
    Each file contains post IDs and corresponding annotations on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes.
    To protect user privacy, only post IDs are released. We recommend retrieving the full post content via the official APIs of each platform, in accordance with their respective terms of service.

    Humanitarian Classes

    Each post is annotated with seven binary humanitarian classes. For each class, the label is either:
    • True – the post contains this humanitarian information
    • False – the post does not contain this information
    These seven humanitarian classes include:
    • Casualty: The post reports people or animals who are killed, injured, or missing during the hurricane.
    • Evacuation: The post describes the evacuation, relocation, rescue, or displacement of individuals or animals due to the hurricane.
    • Damage: The post reports damage to infrastructure or public utilities caused by the hurricane.
    • Advice: The post provides advice, guidance, or suggestions related to hurricanes, including how to stay safe, protect property, or prepare for the disaster.
    • Request: Request for help, support, or resources due to the hurricane
    • Assistance: This includes both physical aid and emotional or psychological support provided by individuals, communities, or organizations.
    • Recovery: The post describes efforts or activities related to the recovery and rebuilding process after the hurricane.
    Note: A single post may be labeled as True for multiple humanitarian categories.

    Bias Classes

    Each post is annotated with five binary bias classes. For each class, the label is either:
    • True – the post contains this bias information
    • False – the post does not contain this information
    These five bias classes include:
    • Linguistic Bias: The post contains biased, inappropriate, or offensive language, with a focus on word choice, tone, or expression.
    • Political Bias: The post expresses political ideology, showing favor or disapproval toward specific political actors, parties, or policies.
    • Gender Bias: The post contains biased, stereotypical, or discriminatory language or viewpoints related to gender.
    • Hate Speech: The post contains language that expresses hatred, hostility, or dehumanization toward a specific group or individual, especially those belonging to minority or marginalized communities.
    • Racial Bias: The post contains biased, discriminatory, or stereotypical statements directed toward one or more racial or ethnic groups.
    Note: A single post may be labeled as True for multiple bias categories.

    Information Integrity Classes

    Each post is also annotated with a single information integrity class, represented by an integer:
    • -1 → False information (i.e., misinformation or disinformation)
    • 0 → Unverifiable information (unclear or lacking sufficient evidence)
    • 1 → True information (verifiable and accurate)

    Key Notes

    1. This dataset is also available at https://huggingface.co/datasets/YRC10/MASH.
    2. Version 1 is no longer available.
  11. o

    #ChatGPT 1000 Daily 🐦 Tweets

    • opendatabay.com
    .csv
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). #ChatGPT 1000 Daily 🐦 Tweets [Dataset]. https://www.opendatabay.com/data/ai-ml/2cf951da-3ce1-4606-a8d6-3f865c4d8a3b
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Social Media and Networking
    Description

    UPDATE: Due to new Twitter API conditions changed by Elon Musk, now it's no longer free to use the Twitter (X) API and the pricing is 100 $/month in the hobby plan. So my automated ETL notebook stopped from updating new tweets to this dataset on May 13th 2023.

    This dataset is was updated everyday with the addition of 1000 tweets/day containing any of the words "ChatGPT", "GPT3", or "GPT4", starting from the 3rd of April 2023. Everyday's tweets are uploaded 24-72h later, so the counter on tweets' likes, retweets, messages and impressions gets enough time to be relevant. Tweets are from any language selected randomly from all hours of the day. There are some basic filters applied trying to discard sensitive tweets and spam.

    This dataset can be used for many different applications regarding to Data Analysis and Visualization but also NLP Sentiment Analysis techniques and more.

    Consider upvoting this Dataset and the ETL scheduled Notebook providing new data everyday into it if you found them interesting, thanks! šŸ¤—

    Columns Description: tweet_id: Integer. unique identifier for each tweet. Older tweets have smaller IDs.

    tweet_created: Timestamp. Time of the tweet's creation.

    tweet_extracted: Timestamp. The UTC time when the ETL pipeline pulled the tweet and its metadata (likes count, retweets count, etc).

    text: String. The raw payload text from the tweet.

    lang: String. Short name for the Tweet text's language.

    user_id: Integer. Twitter's unique user id.

    user_name: String. The author's public name on Twitter.

    user_username: String. The author's Twitter account username (@example)

    user_location: String. The author's public location.

    user_description: String. The author's public profile's bio.

    user_created: Timestamp. Timestamp of user's Twitter account creation.

    user_followers_count: Integer. The number of followers of the author's account at the moment of the tweet extraction

    user_following_count: Integer. The number of followed accounts from the author's account at the moment of the Tweet extraction

    user_tweet_count: Integer. The number of Tweets that the author has published at the moment of the Tweet extraction.

    user_verified: Boolean. True if the user is verified (blue mark).

    source: The device/app used to publish the tweet (Apparently not working, all values are Nan so far). #ChatGPT 1000 Daily 🐦 Tweets retweet_count: Integer. Number of retweets to the Tweet at the moment of the Tweet extraction.

    like_count: Integer. Number of Likes to the Tweet at the moment of the Tweet extraction.

    reply_count: Integer. Number of reply messages to the Tweet.

    impression_count: Integer. Number of times the Tweet has been seen at the moment of the Tweet extraction.

    More info: Tweets API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet Users API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user

    License

    CC0

    Original Data Source: https://www.kaggle.com/datasets/edomingo/chatgpt-1000-daily-tweets

  12. šŸ¤– Tweets and Reactions on DeepSeek šŸ‹

    • kaggle.com
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2025). šŸ¤– Tweets and Reactions on DeepSeek šŸ‹ [Dataset]. http://doi.org/10.34740/kaggle/ds/6566754
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    ā—¾Banner

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fa7cbc12303b5d44a16d50000c9a3963f%2F_495cdf89-be9d-4adb-a65b-182741600a86-small2.jpeg?generation=1738141970159170&alt=media" alt="">

    ā—¾Background Context

    Unless you've been living under a rock, just these past few weeks, a startup company based in China, DeepSeek, has released their models that has taken the AI-tech giants (OpenAI, Meta, Anthropic, Alibaba, etc) by surprise and would potentially disrupt and shake the foundations of the AI industry. Here's a summary

    What is DeepSeek and why is it disrupting the AI sector?

    Chinese startup DeepSeek's launch of its latest AI models, which it says are on a par or better than industry-leading models in the United States at a fraction of the cost, is threatening to upset the technology world order. The company has attracted attention in global AI circles after writing in a paper last month that the training of DeepSeek-V3 required less than $6 million worth of computing power from Nvidia H800 chips.

    And here's one of the many YouTube news videos discussing it

    https://www.youtube.com/watch?v=WEBiebbeNCA" alt="">

    ā—¾A (Very) Good Summary

    Summary by MorganB in a series of Tweets

    ā—¾Dataset Contents

    This dataset contains tweets and reactions about DeepSeek and the models they released, as well as other closely related keywords, such as NVIDIA, OPENAI, ANTHROPIC, META, LLAMA, etc.

    ā—¾Important note āš ļø

    There may be some tweets included that are totally unrelated to AI and Deepseek, as they contains some of the keywords that I used.

    ā—¾Source

    I signed up for a trial with https://twitterapi.io/ , check it out!

    ā—¾Image

    Generated with Bing image generator

  13. Z

    Music Streams Labelled with Listening Situation -...

    • data.niaid.nih.gov
    Updated Oct 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geoffroy Peeters (2021). Music Streams Labelled with Listening Situation - [User/Track/Device/Situation] Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5552287
    Explore at:
    Dataset updated
    Oct 6, 2021
    Dataset provided by
    Karim M. Ibrahim
    Geoffroy Peeters
    Gaƫl Richard
    Elena V. Epure
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a contextual music dataset labeled with the listening situation associated with each stream. Each stream is composed of the user, track, and device data labelled with a situation. The dataset is collected from Deezer for the period of August 2019 from France and Brazil. The dataset is composed of 3 subsets of situations corresponding to 4, 8, and 12 different situations. The situations are extracted based on keyword matching with the associated playlist title in the Deezer catalog. The full set of situational tags are: "work, gym, party, sleep | morning, run, night, dance | car, train, relax, club".

    Each instance contains the track/user/deviice triplets, and a situational tag indicating that this user listens to the track in the associated situation wth the corresponding data recieved from the device. The device data contain: "linear-time, linear-day, circular-time X, circular-time Y,circular-day X, circular-day Y, device-type, network-type". The users are represented as embeddings based on their listening history computed through the matrix factorization of the user/track matrix. Additionally, the users are also represented with their demographic data of : "age, country, gender".

    The creation of the dataset and our experimental results are described in the paper: Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, and Gaƫl Richard. "Audio Autotagging as Proxy for Contextual MusicRecommendation" [Under Revision]. The source code of the paper is available here: https://github.com/KarimMibrahim/Situational_Session_Generator.git

    The dataset is composed of the media_id which is the ID of the track in the Deezer catalog. The 30 seconds track previews used to train the model in the paper can be accessed through the Deezer API: https://developers.deezer.com/api. Each user is represented with an anonymized user_id which is associated with the user embedding available in the user_embeddings.npy file. Note: The index of the embeddings in the user_embeddings arrary corresponds to the user_id, i.e. user_id = 100 have its embeddings at user_embeddings[100].

    Finally, the dataset also contains the splits used in our experiments. Our splits were conditioned by one of three conditions: ColdTrack (no overlap of tracks between the splits), ColdUser (no overlap of users between the splits), and WarmCase (overlaps allowed). Each condition is split into 4 subsets for cross-validation marked with a "fold" number in each condition.

  14. Measurement-based MIMO channel model at 140GHz

    • zenodo.org
    zip
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyƶsti; Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyƶsti (2024). Measurement-based MIMO channel model at 140GHz [Dataset]. http://doi.org/10.5281/zenodo.7640353
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyƶsti; Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyƶsti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1. Introduction

    The file ā€œgen_dd_channel.zipā€ is a package of a wideband multiple-input multiple-output (MIMO) stored radio channel model at 140 GHz in indoor hall, outdoor suburban, residential and urban scenarios. The package consists of 1) measured wideband double-directional multipath data sets estimated from radio channel sounding and processed through measurement-based ray-launching and 2) MATLAB code sets that allows users to generate wideband MIMO radio channels with various antenna array types, e.g., uniform planar and circular arrays at link ends.

    2. What does this package do?

    Outputs of the channel model

    The MATLAB file ā€œChannelGeneratorDD_hexax.mā€ gives the following variables, among others. The .m file also gives optional figures illustrating antennas and radio channel responses.

    Variables

    Descriptions

    CIR

    MIMO channel impulse responses

    CFR

    MIMO channel frequency responses

    Inputs to the channel model

    In order for the MATLAB file ā€œChannelGeneratorDD_hexax.mā€ to run properly, the following inputs are required.

    Directory

    Descriptions

    data_030123_double_directional_paths

    Double-directional multipath data, measured and complemented by ray-launching tool, for various cellular sites.

    User’s parameters

    When using ā€œChannelGeneratorDD_hexax.mā€, the following choices are available.

    Features

    Choices

    Channel model types for transfer function generation

    • 'snapshot': single time sample per link = static, random phase for each path, amplitude from measurements

    • 'virtualMotion': Doppler shifts & temporal fading, static propagation parameters, random phase for each path, amplitude from measurements, Doppler frequency per path from AoA and velocity vector

    Antenna / beam shapes

    • 'single3GPP': single antenna element with power pattern shape defined in 3GPP, adjustable HPBW etc.

    • 'URA': uniform rectangular array, omni-directional elements

    • 'UCA': uniform circular array, omni-directional elements

    List of files in the dataset

    MATLAB codes that implement the channel model

    The MATLAB files consist of the following files.

    File and directory names

    Descriptions

    readme_100223.txt

    Readme file; please read it before using the files

    ChannelGeneratorDD_hexax.m

    Main code to run; a code to integrate antenna arrays and double-directional path data to derive MIMO radio channels. No need to see/edit other files.

    gen_pathDD.m, randl.m, randLoc.m

    Sub-routines used in ChannelGeneratorDD_hexax.m; no need of modifications.

    Hexa-X channel generator DD_presentation.pdf

    User manual of ChannelGeneratorDD_hexax.m.

    Measured multipath data

    The directory "data_030123_double_directional_paths" in the package contains the following files.

    Filenames

    Descriptions

    readme_100223.txt

    Readme file; please read it before using the files

    RTdata_[scenario]_[date].mat

    Containing double-directional multipath parameters at 140 GHz in the specified scenario, estimated from radio channel sounding and ray-tracing.

    description_of_data_dd_[scenario].pdf

    Explaining data formats, the measurement site and sample results.

    References

    Details of the data set are available in the following two documents:

    The stored channel models

    A. Nimr (ed.), "Hexa-X Deliverable D2.3 Radio models and enabling techniques towards ultra-high data rate links and capacity in 6G," April 2023, available: https://hexa-x.eu/deliverables/

    @misc{Hexa-XD23,
    author = {{A. Nimr (ed.)}},
    title = {{Hexa-X Deliverable D2.3 Radio models and enabling techniques towards ultra-high data rate links and capacity in 6G}},
    year = {2023},
    month = {Apr.},
    howpublished = {https://hexa-x.eu/deliverables/},
    }

    Derivation of the data, i.e., radio channel sounding and measurement-based ray-launching

    M. F. De Guzman and K. Haneda, "Analysis of wave-interacting objects in indoor and outdoor environments at 142 GHz," IEEE Transactions on Antennas and Propagation, vol. 71, no. 12, pp. 9838-9848, Dec. 2023, doi: 10.1109/TAP.2023.3318861

    @ARTICLE{DeGuzman23_TAP,
    author={De Guzman, Mar Francis and Haneda, Katsuyuki},
    journal={IEEE Transactions on Antennas and Propagation},
    title={Analysis of Wave-Interacting Objects in Indoor and Outdoor Environments at 142 {GHz}},
    year={2023},
    volume={71},
    number={12},
    pages={9838-9848},
    }

    Finally, the code ā€œrandl.mā€ are from the following MATLAB Central File Exchange.

    Hristo Zhivomirov (2023). Generation of Random Numbers with Laplace Distribution (https://www.mathworks.com/matlabcentral/fileexchange/53397-generation-of-random-numbers-with-laplace-distribution), MATLAB Central File Exchange. Retrieved February 15, 2023.

    Data usage terms

    Any usage of the data must be upon consent on the following conditions:

    • The file ā€œChannelGeneratorDD_hexax.mā€ is owned by OUL. Contact: Dr. Pekka Kyƶsti, Pekka.Kyosti@oulu.fi.
    • The other files and those in the directories, except for ā€œrandl.mā€, are owned by AAU. Contact: Mr. Mar Francis de Guzman, francis.deguzman@aalto.fi.
    • When a scientific paper is published that exploits the data and code, please cite this data set; the citation can be downloaded from the zenodo page of this data set.
  15. m

    Arab Computational Propaganda on X (Twitter)

    • data.mendeley.com
    Updated Sep 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bodor Almotairy (2023). Arab Computational Propaganda on X (Twitter) [Dataset]. http://doi.org/10.17632/58mttpbc7x.2
    Explore at:
    Dataset updated
    Sep 27, 2023
    Authors
    Bodor Almotairy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The database includes five datasets. Three datasets were extracted from a dataset published by X (Twitter Transparancy websites) that includes tweets from malicious accounts trying to manipulate public opinion in the Kingdom of Saudi Arabia. We focused on sports and banking topics when extracting data. Although the propagandist tweets were published by malicious accounts, as X (Twitter) stated, the tweets at their level were not classified as propaganda or not. Propagandists usually mix propaganda and non-propaganda tweets in an attempt to hide their identities. Therefore, it was necessary to classify their tweets as propaganda or not, based on the propaganda technique used. Since the datasets are very large, we annotated a sample of 2,100 tweets. As for reliable account data, we were keen to identify reliable Saudi sources. Then, their tweets that discussed the same topics discussed by the malicious users were crawled. There are two datasets for reliable users in sports and banking topics. The dataset is made up of 16,355,558 tweets from propagandist users and 156,524 tweets from reliable users for the time period of January 1, 2019, to December 31, 20202.

  16. n

    CSU Synthetic Attribution Benchmark Dataset

    • cmr.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). CSU Synthetic Attribution Benchmark Dataset [Dataset]. http://doi.org/10.34911/rdnt.8snx6c
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    This is a synthetic dataset that can be used by users that are interested in benchmarking methods of explainable artificial intelligence (XAI) for geoscientific applications. The dataset is specifically inspired from a climate forecasting setting (seasonal timescales) where the task is to predict regional climate variability given global climate information lagged in time. The dataset consists of a synthetic input X (series of 2D arrays of random fields drawn from a multivariate normal distribution) and a synthetic output Y (scalar series) generated by using a nonlinear function F: R^d -> R.

    The synthetic input aims to represent temporally independent realizations of anomalous global fields of sea surface temperature, the synthetic output series represents some type of regional climate variability that is of interest (temperature, precipitation totals, etc.) and the function F is a simplification of the climate system.

    Since the nonlinear function F that is used to generate the output given the input is known, we also derive and provide the attribution of each output value to the corresponding input features. Using this synthetic dataset users can train any AI model to predict Y given X and then implement XAI methods to interpret it. Based on the ā€œground truthā€ of attribution of F the user can assess the faithfulness of any XAI method.

    NOTE: the spatial configuration of the observations in the NetCDF database file conform to the planetocentric coordinate system (89.5N - 89.5S, 0.5E - 359.5E), where longitude is measured in the positive heading east from the prime meridian.

  17. Fedivertex

    • kaggle.com
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Damie (2025). Fedivertex [Dataset]. http://doi.org/10.34740/kaggle/ds/6877842
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Marc Damie
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25663426%2Fef0839f1c6342b2f89b87d08acfb4b74%2Fpeertube_graph(1).png?generation=1746770713374326&alt=media" alt="Peertube "follow" graph">

    Above is the Peertube "follow" graph. The colours correspond to the language of the server (purple: unknown, green: French, blue: English, black: German, orange: Italian, grey: others).

    Introduction

    Decentralized machine learning---where each client keeps its own data locally and uses its own computational resources to collaboratively train a model by exchanging peer-to-peer messages---is increasingly popular, as it enables better scalability and control over the data. A major challenge in this setting is that learning dynamics depend on the topology of the communication graph, which motivates the use of real graph datasets for benchmarking decentralized algorithms. Unfortunately, existing graph datasets are largely limited to for-profit social networks crawled at a fixed point in time and often collected at the user scale, where links are heavily influenced by the platform and its recommendation algorithms. The Fediverse, which includes several free and open-source decentralized social media platforms such as Mastodon, Misskey, and Lemmy, offers an interesting real-world alternative. We introduce Fedivertex, a new dataset covering seven social networks from the Fediverse, crawled weekly on a weekly basis.

    We refer to our paper for a detailed presentation of the graphs: [SOON]

    Usage

    Python

    We implemented a simple Python API to interact easily with the dataset: https://pypi.org/project/fedivertex/

    pip3 install fedivertex
    

    This package automatically downloads the dataset and generate NetworkX graphs.

    from fedivertex import GraphLoader
    
    loader.list_graph_types("mastodon")
    # List available graphs for a given software, here federation and active_user
    
    G = loader.get_graph(software = "mastodon", graph_type = "active_user", index = 0, only_largest_component = True)
    # G contains the Networkx graph of the giant component of the active users graph at the 1st date of collection
    

    We also provide a Kaggle notebook demonstrating simple operations using this library: https://www.kaggle.com/code/marcdamie/exploratory-graph-data-analysis-of-fedivertex

    Available graphs

    The dataset contains graphs crawled on a daily basis on 7 social networks from the Fediverse. Each graph quantifies/characterizes the interaction differently depending on the information provided by the public API of these networks.

    We present briefly the graph below (NB: the term "instance" refers to servers on the Fediverse):

    • [Bookwyrm/Friendica/Lemmy/Mastodon/Misskey/Pleroma] "federation" graphs: If two instances know each other they are connected in this graph. The federation graph then corresponds to the undirected communication graph between instances.
    • Peertube "follow" graphs: On Peertube, an instance X can follow an instance Y to let its users see all the videos posted on Y. This graph is a directed graph with edges of weight 1 for following.
    • Lemmy "federation with blocks" graphs: This graph completes the federation graph with negative edges when an instance X blocks instance Y. The graph is directed.
    • Lemmy "cross-instance" graphs: two instances are connected as soon as there exists a pair of users who published a message in the same thread, but possibly on a third instance. This is an undirected graph, less sparse than "intra-instance".
    • Lemmy "intra-instance" graphs: the instance X is linked to Y if an user of X has published a message on instance Y. This graph is directed and very sparse.
    • [Mastodon/Misskey/Pleroma] "active users" graphs: For each instance, we consider the set of the 10K most recently active users. Then, for each user of an instance X, we consider the list of the users they follow, and add 1 to the edge from X to Y where Y is the instance the followed users. The weight of the edge from X to Y thus encodes how much the content seen on instance X is generated in instance Y. Note that this graph thus contains self loops.

    These graphs provide diverse perspectives on the Fediverse as they capture more or less subtle phenomenon. For example, "federation" graphs are dense, while "intra-instance" graphs are sparse. We have performed a detailed exploratory data analysis in this notebook.

    Gephi

    Our CSV files are formatted so that they can be directly imported into Gephi for graph visualization. Find below an example Gephi visualization of the Misskey "active users" graph (without the misskey.io node). The colours correspond to the language of the server (purple:Unknown, red: Japanese, brown: Korean, blue: English, yellow: Chinese).

    ![Misskey "active users" graph](https://www.go...

  18. TRMM (TMPA-RT) Near Real-Time IR precipitation estimate L3 1-hour 0.25...

    • datasets.ai
    • cmr.earthdata.nasa.gov
    • +3more
    21, 33, 34
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2024). TRMM (TMPA-RT) Near Real-Time IR precipitation estimate L3 1-hour 0.25 degree x 0.25 degree V7 (TRMM_3B41RT) at GES DISC [Dataset]. https://datasets.ai/datasets/trmm-tmpa-rt-near-real-time-ir-precipitation-estimate-l3-1-hour-0-25-degree-x-0-25-degree-
    Explore at:
    21, 33, 34Available download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    NASAhttp://nasa.gov/
    Authors
    National Aeronautics and Space Administration
    Description

    TMPA (3B41RT) dataset have been discontinued as of Dec. 31, 2019, and users are strongly encouraged to shift to the successor IMERG datasets (doi: 10.5067/GPM/IMERG/3B-HH-E/06, 10.5067/GPM/IMERG/3B-HH-L/06).

    These data were output from the TRMM Multi-satellite Precipitation Analysis (TMPA), the Near Real-Time (RT) processing stream. The latency was about seven hours from the observation time, although processing issues may delay or prevent this schedule. Users should be mindful that the price for the short latency of these data is the reduced quality as compared to the research quality product 3B42. This particular dataset is an intermediate variable (VAR) rainrate IR estimate.

    Data files start with a header consisting of a 2880-byte record containing ASCII characters. The header line makes the file nearly self-documenting, in particular spelling out the variable and version names, and giving the units of the variables.

    Immediately after the header follow 3 data fields, "precip", "error","# pixels", with byte count correspondingly 1382400,1382400,691200. First two are 2-byte integers, and the third is 1-byte. All fields are 1440x480 grid boxes (0-360E,60N-S). The first grid box center is at (0.125E,59.875N). The grid increments most rapidly to the east. Grid boxes without valid data are filled with the (2-byte integer) "missing" value -31999. Valid estimates are only provided in the band 50N-S. This binary data sets are in IEEE "big-endian" floating-point format.

  19. w

    Dataset of book subjects that contain The Contax RTS & Yashica SLR book :...

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain The Contax RTS & Yashica SLR book : for Contax RTS, Yashica FR, DR1, FR11, FX-1, FX2̧, TL-Electro, Electro-X, Electro AX & TL Super users [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+Contax+RTS+%26+Yashica+SLR+book+:+for+Contax+RTS%2C+Yashica+FR%2C+DR1%2C+FR11%2C+FX-1%2C+FX2%CC%A7%2C+TL-Electro%2C+Electro-X%2C+Electro+AX+%26+TL+Super+users&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 1 row and is filtered where the books is The Contax RTS & Yashica SLR book : for Contax RTS, Yashica FR, DR1, FR11, FX-1, FX2̧, TL-Electro, Electro-X, Electro AX & TL Super users. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  20. Z

    [Tweets] 2023 Brazilian Early Political Events

    • data.niaid.nih.gov
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juvino Santos, Lucas RaniƩre (2025). [Tweets] 2023 Brazilian Early Political Events [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14834433
    Explore at:
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    Balby Marinho, Leandro
    Juvino Santos, Lucas RaniƩre
    Campelo, Claudio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    2022 Brazilian Presidential Election

    This dataset contains 13,910,048 tweets from 1,346,340 users, extracted using 157 search terms over 56 different days between January 1st and June 21st, 2023.

    All tweets in this dataset are in Brazilian Portuguese.

    Data Usage

    The dataset contains textual data from tweets, making it suitable for various NLP analyses, such as sentiment analysis, bias or stance detection, and toxic language detection. Additionally, users and tweets can be linked to create social graphs, enabling Social Network Analysis (SNA) to study polarization, communities, and other social dynamics.

    Extraction Method

    This data set was extracted using Twitter's (now X) official API—when Academic Research API access was still available—following the pipeline:

    1. Twitter/X daily monitoring: The dataset author monitored daily political events appearing in Brazil's Trending Topics. Twitter/X has an automated system for classifying trending terms. When a term was identified as political, it was stored along with its date for later use as a search query. 2. Tweet collection using saved search terms: Once terms and their corresponding dates were recorded, tweets were extracted from 12:00 AM to 11:59 PM on the day the term entered the Trending Topics. A language filter was applied to select only tweets in Portuguese. The extraction was performed using the official Twitter/X API. 3. Data storage: The extracted data was organized by day and search term. If the same search term appeared in Trending Topics on consecutive days, a separate file was stored for each respective day.

    Further Information

    For more details, visit:

    • The repository- Dataset short paper:

    DOI: 10.5281/zenodo.14834434

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
Organization logo

Twitter users in the United States 2019-2028

Explore at:
75 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 13, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description

The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

Search
Clear search
Close search
Google apps
Main menu