52 datasets found

Twitter users in the United States 2019-2028
statista.com
ai-chatbox.pro
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
Explore at:
Dataset updated
Jun 13, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.
Twitter Dataset
brightdata.com
.json, .csv, .xlsx
Updated Dec 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data, Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 24, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.
Z
A study on real graphs of fake news spreading on Twitter
data.niaid.nih.gov
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
Explore at:
Dataset updated
Aug 20, 2021
Dataset authored and provided by
Amirhosein Bodaghi
Description
*** Fake News on Twitter ***

These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

DD

DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

The structure of excel files for each dataset is as follow:

Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

User ID (user who has posted the current tweet/retweet)

The description sentence in the profile of the user who has published the tweet/retweet

The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

Date and time of creation of the account by which the current tweet/retweet has been posted

Language of the tweet/retweet

Number of followers

Number of followings (friends)

Date and time of posting the current tweet/retweet

Number of like (favorite) the current tweet had been acquired before crawling it

Number of times the current tweet had been retweeted before crawling it

Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

The source (OS) of device by which the current tweet/retweet was posted

Tweet/Retweet ID

Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

r : The tweet/retweet is a fake news post

a : The tweet/retweet is a truth post

q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

DG

DG for each fake news contains two files:

A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
Master X-Ray Catalog - Dataset - NASA Open Data Portal
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Master X-Ray Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/master-x-ray-catalog
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .
Annual Respondents Database X, 1997-2020: Secure Access
beta.ukdataservice.ac.uk
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office For National Statistics (ONS) (2024). Annual Respondents Database X, 1997-2020: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-7989-5
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-7989-5
Dataset updated
2024
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
DataCitehttps://www.datacite.org/
Authors
Office For National Statistics (ONS)
Description
The Annual Respondents Database X (ARDx) has been created to allow users of Annual Respondents Database (ARD) (held at the UK Data Archive under SN 6644) to continue analysis even though the Annual Business Inquiry (ABI) which was used to create ARD ceased in 2008. ARDx contains harmonised variables from 1997 to 2020.

ARDx is created from two ONS surveys, the Annual Business Inquiry (ABI; 1998-2008, held at the UK Data Archive under SN 6644) and the Annual Business Survey (ABS; 2009 onwards, held at the UK Data Archive under SN 7451). The ABI has an employment survey (ABI1) and a second survey for financial information (ABI2). ABS only collects financial data, and so is supplemented with employment data from the Business Register and Employment Survey (BRES; 2009 onwards, held at the UK Data Archive under SN 7463).

ARDx consists of six types of files: 'respondent files' which have reported and derived information from survey questionnaire responses; and 'universe files' which contain limited information on all business that are within scope of the ABI/ABS. These files are provided at both the Reporting Unit and Local Unit levels. There are also 'register panel' and "capital stock" files.

Linking to other business studies
These data contain Inter-Departmental Business Register (IDBR) reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.

For the fifth edition (December 2023), ARDx Version 4.0 for 1997-2020 has been provided, replacing Version 3. Coverage has thus been expanded to include 1997 and 2015-2020.

Note to users
Due to the limited nature of the documentation available for ARDx, users are advised to consult the documentation for the Annual Business Survey (UK Data Archive SN 7451) for detailed information about the data.

For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.
c
Master X-Ray Catalog
s.cnmilf.com
catalog.data.gov
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
High Energy Astrophysics Science Archive Research Center (2025). Master X-Ray Catalog [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/master-x-ray-catalog
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
High Energy Astrophysics Science Archive Research Center
Description
The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .
Russian Foreign Ministry Twitter Accounts Dataset
zenodo.org
data.niaid.nih.gov
bin
Updated Sep 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Shultz; Benjamin Shultz; E Rosalie Li; E Rosalie Li (2024). Russian Foreign Ministry Twitter Accounts Dataset [Dataset]. http://doi.org/10.5281/zenodo.11489527
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11489527
Dataset updated
Sep 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Shultz; Benjamin Shultz; E Rosalie Li; E Rosalie Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 5, 2024
Area covered
Russia
Description
This publication introduces a novel dataset of 403 diplomatic X/Twitter accounts belonging to the Russian government (primarily the Russian Foreign Ministry) and accompanying metadata. These accounts have become a known vector in the spread of false and misleading information around the Russian invasion of Ukraine, however, given new restrictions on the accessibility of the X/Twitter API and visibility of users' following lists, the vast majority of these accounts are no longer easily discoverable by researchers. The primary aim behind the publication of this dataset is to provide a comprehensive resource for further analysis of this disinformation vector.
Dataset for: A technique for in-situ displacement and strain measurement...
catalog.data.gov
data.nist.gov
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). Dataset for: A technique for in-situ displacement and strain measurement with laboratory-scale X-ray Computed Tomography [Dataset]. https://catalog.data.gov/dataset/dataset-for-a-technique-for-in-situ-displacement-and-strain-measurement-with-laboratory-sc-f123a
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This record provides raw and post-processed data used in the associated paper "A technique for in-situ displacement and strain measurement with laboratory-scale X-ray Computed Tomography." The codes used are provided in a separate software publication "SerialTrackXR", also referenced. The data consist of 3D X-ray computed tomography (X-ray CT) scans, projection images, and load/displacement data of two additively manufactured tensile test coupons made from IN718 with different processing conditions. The 3D images were collected in-situ at progressively increasing levels of applied displacement. The projection images track this displacement both total and as surface maps. Load/displacement data from the load frame used to apply displacement are also provided. A displacement tracking validation dataset, consisting of known rigid body displacements imposed on a nominally un-deformed third test specimen is also included.The X-ray CT data are rather large, each .tiff stack being about 2 GB; the displacement and strain map files are also >1 GB. Other data are relatively smaller. The dataset consists of 170 files, totaling 69.9 GB (as noted above, much of this space is in 3D images and raw data for the 3D images - most users will not need to interact with those raw data).
Medical Imaging (CT-Xray) Colorization New Dataset
kaggle.com
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuvo Kumar Basak-4004.o (2025). Medical Imaging (CT-Xray) Colorization New Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11072909
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11072909
Dataset updated
Mar 18, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shuvo Kumar Basak-4004.o
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Medical Imaging (CT-Xray) Colorization New Dataset 🩺💻🖼️ This dataset provides a collection of medical imaging data, including both CT (Computed Tomography) and X-ray images, with an added focus on colorization techniques. The goal of this dataset is to facilitate the enhancement of diagnostic processes by applying various colorization techniques to grayscale medical images, allowing researchers and machine learning models to explore the effects of color in radiology.

Key Features: CT and X-ray Images 🏥: Contains both CT scans and X-ray images, widely used in medical diagnostics. Colorized Medical Images 🌈: Each image has been colorized using advanced methods to improve visual interpretation and analysis, including details that might not be immediately obvious in grayscale images. New Dataset 📊: This dataset is newly created to provide high-quality colorized medical imaging, ideal for training AI models in medical image analysis and enhancing diagnostic accuracy. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15408835%2F4bfb7257cf09b0a118808b289c6c3ed4%2Fmotion_image.gif?generation=1742292037458801&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15408835%2F20c64287d3b580a36bf8f948f82dbb6b%2Fmotion_image2.gif?generation=1742292060396551&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15408835%2Fdb91cac64f5a6a9100ac117fc8a55ee5%2Fmotion_image4.gif?generation=1742292150147491&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15408835%2F8624a8cab05645e3a5f02a2c1e3e9e3f%2Fmotion_image3.gif?generation=1742292165846162&alt=media" alt="">

Methods Used for Colorization: Basic Color Map Application 🎨: Applying standard color maps to highlight structures in CT and X-ray images. Adaptive Histogram Equalization (CLAHE) 🔍: Adaptive enhancement to improve contrast and highlight important features, especially in medical contexts. Contrast Stretching 📈: Adjusting image intensity to enhance visual details and improve diagnostic quality. Gaussian Blur 🌀: Applied to reduce noise, offering a smoother image for better processing. Edge Detection (Canny) ✨: Detecting edges and contours, useful for identifying specific features in medical scans. Random Color Palettes 🎨: Using randomized color schemes for unique visual representations. Gamma Correction 🌟: Adjusting image brightness to reveal more information hidden in the shadows. LUT (Lookup Table) Color Mapping 💡: Applying predefined color lookups for visually appealing representations. Alpha Blending 🔶: Blending colorized regions based on certain thresholds to highlight structures or anomalies. 3D Rendering 🔺: For creating 3D-like visualizations from 2D scans. Heatmap Visualization 🔥: Highlighting areas of interest, such as anomalies or tumors, using heatmap color gradients. Interactive Segmentation 🖱️: Interactive visualizations that help in segmenting regions of interest in medical images. Applications 🏥💡 This dataset has numerous applications, particularly in the field of medical image analysis, AI development, and diagnostic improvement. Some of the major applications include:

Medical Diagnostics Enhancement 🔍:

Colorization can aid radiologists in interpreting CT and X-ray images by making abnormalities more visible. Helps in visualizing tumors, fractures, or other anomalies, especially in cases where grayscale images are hard to interpret. AI and Machine Learning for Healthcare 🤖:

Used for training deep learning models in image segmentation, detection, and classification of diseases (e.g., cancer detection). AI models can be trained on these colorized images to improve accuracy in diagnostic tools, leading to early disease detection. Medical Image Enhancement 🖼️:

Enables improved contrast, better detail visibility, and highlighting of specific anatomical regions using color. Colorization may improve the accuracy of radiological assessments by allowing professionals to more easily spot abnormalities and changes over time. Data Augmentation for Model Training 📚:

The colorized images can serve as an additional data source for training AI models, increasing model robustness through synthetic data generation. Various colorization methods (like heatmaps and random palettes) can be used to augment image variations, improving model performance under different conditions. Visualizing Anomalies for Anomaly Detection 🔥:

Heatmap visualization helps detect subtle and hidden anomalies by coloring the areas of interest with intensity, enabling faster identification of potential issues. Edge detection and segmentation techniques enhance the ability to detect the edges and boundaries of tumors, fractures, and other critical features. 3D Image Rendering for Detailed Analysis 🧠:

3D rend...
MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane
zenodo.org
Updated May 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2025). MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane [Dataset]. http://doi.org/10.5281/zenodo.15401479
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15401479
Dataset updated
May 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

We present a Multiplatform Annotated Dataset for Societal Impact of Hurricane (MASH) that includes 98,662 relevant social media data posts from Reddit, X, TikTok, and YouTube.

In addition, all relevant posts are annotated on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes in a multi-modal approach that considers both textual and visual content (text, images, and videos), providing a rich labeled dataset for in-depth analysis.

The dataset is also complemented by an Online Analytics Platform (https://hurricane.web.illinois.edu/) that not only allows users to view hurricane-related posts and articles, but also explores high-frequency keywords, user sentiment, and the locations where posts were made.

To our best knowledge, MASH is the first large-scale, multi-platform, multimodal, and multi-dimensionally annotated hurricane dataset. We envision that MASH can contribute to the study of hurricanes' impact on society, such as disaster severity classification, event detections, public sentiment analysis, and bias identification.

Usage Notice

This dataset includes four annotation files:

• reddit_anno_publish.csv

• tiktok_anno_publish.csv

• twitter_anno_publish.csv

• youtube_anno_publish.csv

Each file contains post IDs and corresponding annotations on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes.

To protect user privacy, only post IDs are released. We recommend retrieving the full post content via the official APIs of each platform, in accordance with their respective terms of service.

- Reddit API (https://www.reddit.com/dev/api)

- TikTok API (https://developers.tiktok.com/products/research-api)

- X/Twitter API (https://developer.x.com/en/docs/x-api)

- YouTube API (https://developers.google.com/youtube/v3)

Humanitarian Classes

Each post is annotated with seven binary humanitarian classes. For each class, the label is either:

• True – the post contains this humanitarian information

• False – the post does not contain this information

These seven humanitarian classes include:

• Casualty: The post reports people or animals who are killed, injured, or missing during the hurricane.

• Evacuation: The post describes the evacuation, relocation, rescue, or displacement of individuals or animals due to the hurricane.

• Damage: The post reports damage to infrastructure or public utilities caused by the hurricane.

• Advice: The post provides advice, guidance, or suggestions related to hurricanes, including how to stay safe, protect property, or prepare for the disaster.

• Request: Request for help, support, or resources due to the hurricane

• Assistance: This includes both physical aid and emotional or psychological support provided by individuals, communities, or organizations.

• Recovery: The post describes efforts or activities related to the recovery and rebuilding process after the hurricane.

Note: A single post may be labeled as True for multiple humanitarian categories.

Bias Classes

Each post is annotated with five binary bias classes. For each class, the label is either:

• True – the post contains this bias information

• False – the post does not contain this information

These five bias classes include:

• Linguistic Bias: The post contains biased, inappropriate, or offensive language, with a focus on word choice, tone, or expression.

• Political Bias: The post expresses political ideology, showing favor or disapproval toward specific political actors, parties, or policies.

• Gender Bias: The post contains biased, stereotypical, or discriminatory language or viewpoints related to gender.

• Hate Speech: The post contains language that expresses hatred, hostility, or dehumanization toward a specific group or individual, especially those belonging to minority or marginalized communities.

• Racial Bias: The post contains biased, discriminatory, or stereotypical statements directed toward one or more racial or ethnic groups.

Note: A single post may be labeled as True for multiple bias categories.

Information Integrity Classes

Each post is also annotated with a single information integrity class, represented by an integer:

• -1 → False information (i.e., misinformation or disinformation)

• 0 → Unverifiable information (unclear or lacking sufficient evidence)

• 1 → True information (verifiable and accurate)

Key Notes

This dataset is also available at https://huggingface.co/datasets/YRC10/MASH.

Version 1 is no longer available.
o
#ChatGPT 1000 Daily 🐦 Tweets
opendatabay.com
.csv
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). #ChatGPT 1000 Daily 🐦 Tweets [Dataset]. https://www.opendatabay.com/data/ai-ml/2cf951da-3ce1-4606-a8d6-3f865c4d8a3b
Explore at:
.csvAvailable download formats
Dataset updated
Jun 18, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Social Media and Networking
Description
UPDATE: Due to new Twitter API conditions changed by Elon Musk, now it's no longer free to use the Twitter (X) API and the pricing is 100 $/month in the hobby plan. So my automated ETL notebook stopped from updating new tweets to this dataset on May 13th 2023.

This dataset is was updated everyday with the addition of 1000 tweets/day containing any of the words "ChatGPT", "GPT3", or "GPT4", starting from the 3rd of April 2023. Everyday's tweets are uploaded 24-72h later, so the counter on tweets' likes, retweets, messages and impressions gets enough time to be relevant. Tweets are from any language selected randomly from all hours of the day. There are some basic filters applied trying to discard sensitive tweets and spam.

This dataset can be used for many different applications regarding to Data Analysis and Visualization but also NLP Sentiment Analysis techniques and more.

Consider upvoting this Dataset and the ETL scheduled Notebook providing new data everyday into it if you found them interesting, thanks! 🤗

Columns Description: tweet_id: Integer. unique identifier for each tweet. Older tweets have smaller IDs.

tweet_created: Timestamp. Time of the tweet's creation.

tweet_extracted: Timestamp. The UTC time when the ETL pipeline pulled the tweet and its metadata (likes count, retweets count, etc).

text: String. The raw payload text from the tweet.

lang: String. Short name for the Tweet text's language.

user_id: Integer. Twitter's unique user id.

user_name: String. The author's public name on Twitter.

user_username: String. The author's Twitter account username (@example)

user_location: String. The author's public location.

user_description: String. The author's public profile's bio.

user_created: Timestamp. Timestamp of user's Twitter account creation.

user_followers_count: Integer. The number of followers of the author's account at the moment of the tweet extraction

user_following_count: Integer. The number of followed accounts from the author's account at the moment of the Tweet extraction

user_tweet_count: Integer. The number of Tweets that the author has published at the moment of the Tweet extraction.

user_verified: Boolean. True if the user is verified (blue mark).

source: The device/app used to publish the tweet (Apparently not working, all values are Nan so far). #ChatGPT 1000 Daily 🐦 Tweets retweet_count: Integer. Number of retweets to the Tweet at the moment of the Tweet extraction.

like_count: Integer. Number of Likes to the Tweet at the moment of the Tweet extraction.

reply_count: Integer. Number of reply messages to the Tweet.

impression_count: Integer. Number of times the Tweet has been seen at the moment of the Tweet extraction.

More info: Tweets API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet Users API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user

License

CC0

Original Data Source: https://www.kaggle.com/datasets/edomingo/chatgpt-1000-daily-tweets
🤖 Tweets and Reactions on DeepSeek 🐋
kaggle.com
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2025). 🤖 Tweets and Reactions on DeepSeek 🐋 [Dataset]. http://doi.org/10.34740/kaggle/ds/6566754
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/6566754
Dataset updated
Jan 29, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
◾Banner

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fa7cbc12303b5d44a16d50000c9a3963f%2F_495cdf89-be9d-4adb-a65b-182741600a86-small2.jpeg?generation=1738141970159170&alt=media" alt="">

◾Background Context

Unless you've been living under a rock, just these past few weeks, a startup company based in China, DeepSeek, has released their models that has taken the AI-tech giants (OpenAI, Meta, Anthropic, Alibaba, etc) by surprise and would potentially disrupt and shake the foundations of the AI industry. Here's a summary

What is DeepSeek and why is it disrupting the AI sector?

Chinese startup DeepSeek's launch of its latest AI models, which it says are on a par or better than industry-leading models in the United States at a fraction of the cost, is threatening to upset the technology world order. The company has attracted attention in global AI circles after writing in a paper last month that the training of DeepSeek-V3 required less than $6 million worth of computing power from Nvidia H800 chips.

And here's one of the many YouTube news videos discussing it

https://www.youtube.com/watch?v=WEBiebbeNCA" alt="">

◾A (Very) Good Summary

Summary by MorganB in a series of Tweets

◾Dataset Contents

This dataset contains tweets and reactions about DeepSeek and the models they released, as well as other closely related keywords, such as NVIDIA, OPENAI, ANTHROPIC, META, LLAMA, etc.

◾Important note ⚠️

There may be some tweets included that are totally unrelated to AI and Deepseek, as they contains some of the keywords that I used.

◾Source

I signed up for a trial with https://twitterapi.io/ , check it out!

◾Image

Generated with Bing image generator
Z
Music Streams Labelled with Listening Situation -...
data.niaid.nih.gov
Updated Oct 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geoffroy Peeters (2021). Music Streams Labelled with Listening Situation - [User/Track/Device/Situation] Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5552287
Explore at:
Dataset updated
Oct 6, 2021
Dataset provided by
Karim M. Ibrahim
Geoffroy Peeters
Gaël Richard
Elena V. Epure
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a contextual music dataset labeled with the listening situation associated with each stream. Each stream is composed of the user, track, and device data labelled with a situation. The dataset is collected from Deezer for the period of August 2019 from France and Brazil. The dataset is composed of 3 subsets of situations corresponding to 4, 8, and 12 different situations. The situations are extracted based on keyword matching with the associated playlist title in the Deezer catalog. The full set of situational tags are: "work, gym, party, sleep | morning, run, night, dance | car, train, relax, club".

Each instance contains the track/user/deviice triplets, and a situational tag indicating that this user listens to the track in the associated situation wth the corresponding data recieved from the device. The device data contain: "linear-time, linear-day, circular-time X, circular-time Y,circular-day X, circular-day Y, device-type, network-type". The users are represented as embeddings based on their listening history computed through the matrix factorization of the user/track matrix. Additionally, the users are also represented with their demographic data of : "age, country, gender".

The creation of the dataset and our experimental results are described in the paper: Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, and Gaël Richard. "Audio Autotagging as Proxy for Contextual MusicRecommendation" [Under Revision]. The source code of the paper is available here: https://github.com/KarimMibrahim/Situational_Session_Generator.git

The dataset is composed of the media_id which is the ID of the track in the Deezer catalog. The 30 seconds track previews used to train the model in the paper can be accessed through the Deezer API: https://developers.deezer.com/api. Each user is represented with an anonymized user_id which is associated with the user embedding available in the user_embeddings.npy file. Note: The index of the embeddings in the user_embeddings arrary corresponds to the user_id, i.e. user_id = 100 have its embeddings at user_embeddings[100].

Finally, the dataset also contains the splits used in our experiments. Our splits were conditioned by one of three conditions: ColdTrack (no overlap of tracks between the splits), ColdUser (no overlap of users between the splits), and WarmCase (overlaps allowed). Each condition is split into 4 subsets for cross-validation marked with a "fold" number in each condition.

Measurement-based MIMO channel model at 140GHz

zenodo.org

zip

Updated Apr 6, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyösti; Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyösti (2024). Measurement-based MIMO channel model at 140GHz [Dataset]. http://doi.org/10.5281/zenodo.7640353

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7640353

Dataset updated

Apr 6, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyösti; Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyösti

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

1. Introduction

The file “gen_dd_channel.zip” is a package of a wideband multiple-input multiple-output (MIMO) stored radio channel model at 140 GHz in indoor hall, outdoor suburban, residential and urban scenarios. The package consists of 1) measured wideband double-directional multipath data sets estimated from radio channel sounding and processed through measurement-based ray-launching and 2) MATLAB code sets that allows users to generate wideband MIMO radio channels with various antenna array types, e.g., uniform planar and circular arrays at link ends.

2. What does this package do?

Outputs of the channel model

The MATLAB file “ChannelGeneratorDD_hexax.m” gives the following variables, among others. The .m file also gives optional figures illustrating antennas and radio channel responses.

Variables	Descriptions
CIR	MIMO channel impulse responses
CFR	MIMO channel frequency responses

Inputs to the channel model

In order for the MATLAB file “ChannelGeneratorDD_hexax.m” to run properly, the following inputs are required.

Directory	Descriptions
data_030123_double_directional_paths	Double-directional multipath data, measured and complemented by ray-launching tool, for various cellular sites.

User’s parameters

When using “ChannelGeneratorDD_hexax.m”, the following choices are available.

Features	Choices
Channel model types for transfer function generation	'snapshot': single time sample per link = static, random phase for each path, amplitude from measurements 'virtualMotion': Doppler shifts & temporal fading, static propagation parameters, random phase for each path, amplitude from measurements, Doppler frequency per path from AoA and velocity vector
Antenna / beam shapes	'single3GPP': single antenna element with power pattern shape defined in 3GPP, adjustable HPBW etc. 'URA': uniform rectangular array, omni-directional elements 'UCA': uniform circular array, omni-directional elements

List of files in the dataset

MATLAB codes that implement the channel model

The MATLAB files consist of the following files.

File and directory names	Descriptions
readme_100223.txt	Readme file; please read it before using the files
ChannelGeneratorDD_hexax.m	Main code to run; a code to integrate antenna arrays and double-directional path data to derive MIMO radio channels. No need to see/edit other files.
gen_pathDD.m, randl.m, randLoc.m	Sub-routines used in ChannelGeneratorDD_hexax.m; no need of modifications.
Hexa-X channel generator DD_presentation.pdf	User manual of ChannelGeneratorDD_hexax.m.

Measured multipath data

The directory "data_030123_double_directional_paths" in the package contains the following files.

Filenames	Descriptions
readme_100223.txt	Readme file; please read it before using the files
RTdata_[scenario]_[date].mat	Containing double-directional multipath parameters at 140 GHz in the specified scenario, estimated from radio channel sounding and ray-tracing.
description_of_data_dd_[scenario].pdf	Explaining data formats, the measurement site and sample results.

References

Details of the data set are available in the following two documents:

The stored channel models

A. Nimr (ed.), "Hexa-X Deliverable D2.3 Radio models and enabling techniques towards ultra-high data rate links and capacity in 6G," April 2023, available: https://hexa-x.eu/deliverables/

@misc{Hexa-XD23,
author = {{A. Nimr (ed.)}},
title = {{Hexa-X Deliverable D2.3 Radio models and enabling techniques towards ultra-high data rate links and capacity in 6G}},
year = {2023},
month = {Apr.},
howpublished = {https://hexa-x.eu/deliverables/},
}

Derivation of the data, i.e., radio channel sounding and measurement-based ray-launching

M. F. De Guzman and K. Haneda, "Analysis of wave-interacting objects in indoor and outdoor environments at 142 GHz," IEEE Transactions on Antennas and Propagation, vol. 71, no. 12, pp. 9838-9848, Dec. 2023, doi: 10.1109/TAP.2023.3318861

@ARTICLE{DeGuzman23_TAP,
author={De Guzman, Mar Francis and Haneda, Katsuyuki},
journal={IEEE Transactions on Antennas and Propagation},
title={Analysis of Wave-Interacting Objects in Indoor and Outdoor Environments at 142 {GHz}},
year={2023},
volume={71},
number={12},
pages={9838-9848},
}

Finally, the code “randl.m” are from the following MATLAB Central File Exchange.

Hristo Zhivomirov (2023). Generation of Random Numbers with Laplace Distribution (https://www.mathworks.com/matlabcentral/fileexchange/53397-generation-of-random-numbers-with-laplace-distribution), MATLAB Central File Exchange. Retrieved February 15, 2023.

Data usage terms

Any usage of the data must be upon consent on the following conditions:

The file “ChannelGeneratorDD_hexax.m” is owned by OUL. Contact: Dr. Pekka Kyösti, Pekka.Kyosti@oulu.fi.
The other files and those in the directories, except for “randl.m”, are owned by AAU. Contact: Mr. Mar Francis de Guzman, francis.deguzman@aalto.fi.
When a scientific paper is published that exploits the data and code, please cite this data set; the citation can be downloaded from the zenodo page of this data set.

m
Arab Computational Propaganda on X (Twitter)
data.mendeley.com
Updated Sep 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bodor Almotairy (2023). Arab Computational Propaganda on X (Twitter) [Dataset]. http://doi.org/10.17632/58mttpbc7x.2
Explore at:
Unique identifier
https://doi.org/10.17632/58mttpbc7x.2
Dataset updated
Sep 27, 2023
Authors
Bodor Almotairy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The database includes five datasets. Three datasets were extracted from a dataset published by X (Twitter Transparancy websites) that includes tweets from malicious accounts trying to manipulate public opinion in the Kingdom of Saudi Arabia. We focused on sports and banking topics when extracting data. Although the propagandist tweets were published by malicious accounts, as X (Twitter) stated, the tweets at their level were not classified as propaganda or not. Propagandists usually mix propaganda and non-propaganda tweets in an attempt to hide their identities. Therefore, it was necessary to classify their tweets as propaganda or not, based on the propaganda technique used. Since the datasets are very large, we annotated a sample of 2,100 tweets. As for reliable account data, we were keen to identify reliable Saudi sources. Then, their tweets that discussed the same topics discussed by the malicious users were crawled. There are two datasets for reliable users in sports and banking topics. The dataset is made up of 16,355,558 tweets from propagandist users and 156,524 tweets from reliable users for the time period of January 1, 2019, to December 31, 20202.
n
CSU Synthetic Attribution Benchmark Dataset
cmr.earthdata.nasa.gov
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). CSU Synthetic Attribution Benchmark Dataset [Dataset]. http://doi.org/10.34911/rdnt.8snx6c
Explore at:
Unique identifier
https://doi.org/10.34911/rdnt.8snx6c
Dataset updated
Oct 10, 2023
Time period covered
Jan 1, 2020 - Jan 1, 2023
Area covered
Description
This is a synthetic dataset that can be used by users that are interested in benchmarking methods of explainable artificial intelligence (XAI) for geoscientific applications. The dataset is specifically inspired from a climate forecasting setting (seasonal timescales) where the task is to predict regional climate variability given global climate information lagged in time. The dataset consists of a synthetic input X (series of 2D arrays of random fields drawn from a multivariate normal distribution) and a synthetic output Y (scalar series) generated by using a nonlinear function F: R^d -> R.

The synthetic input aims to represent temporally independent realizations of anomalous global fields of sea surface temperature, the synthetic output series represents some type of regional climate variability that is of interest (temperature, precipitation totals, etc.) and the function F is a simplification of the climate system.

Since the nonlinear function F that is used to generate the output given the input is known, we also derive and provide the attribution of each output value to the corresponding input features. Using this synthetic dataset users can train any AI model to predict Y given X and then implement XAI methods to interpret it. Based on the “ground truth” of attribution of F the user can assess the faithfulness of any XAI method.

NOTE: the spatial configuration of the observations in the NetCDF database file conform to the planetocentric coordinate system (89.5N - 89.5S, 0.5E - 359.5E), where longitude is measured in the positive heading east from the prime meridian.
Fedivertex
kaggle.com
Updated May 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marc Damie (2025). Fedivertex [Dataset]. http://doi.org/10.34740/kaggle/ds/6877842
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/6877842
Dataset updated
May 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Marc Damie
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25663426%2Fef0839f1c6342b2f89b87d08acfb4b74%2Fpeertube_graph(1).png?generation=1746770713374326&alt=media" alt="Peertube "follow" graph">

Above is the Peertube "follow" graph. The colours correspond to the language of the server (purple: unknown, green: French, blue: English, black: German, orange: Italian, grey: others).

Introduction

Decentralized machine learning---where each client keeps its own data locally and uses its own computational resources to collaboratively train a model by exchanging peer-to-peer messages---is increasingly popular, as it enables better scalability and control over the data. A major challenge in this setting is that learning dynamics depend on the topology of the communication graph, which motivates the use of real graph datasets for benchmarking decentralized algorithms. Unfortunately, existing graph datasets are largely limited to for-profit social networks crawled at a fixed point in time and often collected at the user scale, where links are heavily influenced by the platform and its recommendation algorithms. The Fediverse, which includes several free and open-source decentralized social media platforms such as Mastodon, Misskey, and Lemmy, offers an interesting real-world alternative. We introduce Fedivertex, a new dataset covering seven social networks from the Fediverse, crawled weekly on a weekly basis.

We refer to our paper for a detailed presentation of the graphs: [SOON]

Usage

Python

We implemented a simple Python API to interact easily with the dataset: https://pypi.org/project/fedivertex/

pip3 install fedivertex

This package automatically downloads the dataset and generate NetworkX graphs.

from fedivertex import GraphLoader loader.list_graph_types("mastodon") # List available graphs for a given software, here federation and active_user G = loader.get_graph(software = "mastodon", graph_type = "active_user", index = 0, only_largest_component = True) # G contains the Networkx graph of the giant component of the active users graph at the 1st date of collection

We also provide a Kaggle notebook demonstrating simple operations using this library: https://www.kaggle.com/code/marcdamie/exploratory-graph-data-analysis-of-fedivertex

Available graphs

The dataset contains graphs crawled on a daily basis on 7 social networks from the Fediverse. Each graph quantifies/characterizes the interaction differently depending on the information provided by the public API of these networks.

We present briefly the graph below (NB: the term "instance" refers to servers on the Fediverse):

[Bookwyrm/Friendica/Lemmy/Mastodon/Misskey/Pleroma] "federation" graphs: If two instances know each other they are connected in this graph. The federation graph then corresponds to the undirected communication graph between instances.

Peertube "follow" graphs: On Peertube, an instance X can follow an instance Y to let its users see all the videos posted on Y. This graph is a directed graph with edges of weight 1 for following.

Lemmy "federation with blocks" graphs: This graph completes the federation graph with negative edges when an instance X blocks instance Y. The graph is directed.

Lemmy "cross-instance" graphs: two instances are connected as soon as there exists a pair of users who published a message in the same thread, but possibly on a third instance. This is an undirected graph, less sparse than "intra-instance".

Lemmy "intra-instance" graphs: the instance X is linked to Y if an user of X has published a message on instance Y. This graph is directed and very sparse.

[Mastodon/Misskey/Pleroma] "active users" graphs: For each instance, we consider the set of the 10K most recently active users. Then, for each user of an instance X, we consider the list of the users they follow, and add 1 to the edge from X to Y where Y is the instance the followed users. The weight of the edge from X to Y thus encodes how much the content seen on instance X is generated in instance Y. Note that this graph thus contains self loops.

These graphs provide diverse perspectives on the Fediverse as they capture more or less subtle phenomenon. For example, "federation" graphs are dense, while "intra-instance" graphs are sparse. We have performed a detailed exploratory data analysis in this notebook.

Gephi

Our CSV files are formatted so that they can be directly imported into Gephi for graph visualization. Find below an example Gephi visualization of the Misskey "active users" graph (without the misskey.io node). The colours correspond to the language of the server (purple:Unknown, red: Japanese, brown: Korean, blue: English, yellow: Chinese).

![Misskey "active users" graph](https://www.go...
TRMM (TMPA-RT) Near Real-Time IR precipitation estimate L3 1-hour 0.25...
datasets.ai
cmr.earthdata.nasa.gov
+3more
21, 33, 34
Updated Aug 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2024). TRMM (TMPA-RT) Near Real-Time IR precipitation estimate L3 1-hour 0.25 degree x 0.25 degree V7 (TRMM_3B41RT) at GES DISC [Dataset]. https://datasets.ai/datasets/trmm-tmpa-rt-near-real-time-ir-precipitation-estimate-l3-1-hour-0-25-degree-x-0-25-degree-
Explore at:
21, 33, 34Available download formats
Dataset updated
Aug 8, 2024
Dataset provided by
NASAhttp://nasa.gov/
Authors
National Aeronautics and Space Administration
Description
TMPA (3B41RT) dataset have been discontinued as of Dec. 31, 2019, and users are strongly encouraged to shift to the successor IMERG datasets (doi: 10.5067/GPM/IMERG/3B-HH-E/06, 10.5067/GPM/IMERG/3B-HH-L/06).

These data were output from the TRMM Multi-satellite Precipitation Analysis (TMPA), the Near Real-Time (RT) processing stream. The latency was about seven hours from the observation time, although processing issues may delay or prevent this schedule. Users should be mindful that the price for the short latency of these data is the reduced quality as compared to the research quality product 3B42. This particular dataset is an intermediate variable (VAR) rainrate IR estimate.

Data files start with a header consisting of a 2880-byte record containing ASCII characters. The header line makes the file nearly self-documenting, in particular spelling out the variable and version names, and giving the units of the variables.

Immediately after the header follow 3 data fields, "precip", "error","# pixels", with byte count correspondingly 1382400,1382400,691200. First two are 2-byte integers, and the third is 1-byte. All fields are 1440x480 grid boxes (0-360E,60N-S). The first grid box center is at (0.125E,59.875N). The grid increments most rapidly to the east. Grid boxes without valid data are filled with the (2-byte integer) "missing" value -31999. Valid estimates are only provided in the band 50N-S. This binary data sets are in IEEE "big-endian" floating-point format.
w
Dataset of book subjects that contain The Contax RTS & Yashica SLR book :...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain The Contax RTS & Yashica SLR book : for Contax RTS, Yashica FR, DR1, FR11, FX-1, FX2̧, TL-Electro, Electro-X, Electro AX & TL Super users [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+Contax+RTS+%26+Yashica+SLR+book+:+for+Contax+RTS%2C+Yashica+FR%2C+DR1%2C+FR11%2C+FX-1%2C+FX2%CC%A7%2C+TL-Electro%2C+Electro-X%2C+Electro+AX+%26+TL+Super+users&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 1 row and is filtered where the books is The Contax RTS & Yashica SLR book : for Contax RTS, Yashica FR, DR1, FR11, FX-1, FX2̧, TL-Electro, Electro-X, Electro AX & TL Super users. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Z
[Tweets] 2023 Brazilian Early Political Events
data.niaid.nih.gov
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juvino Santos, Lucas Raniére (2025). [Tweets] 2023 Brazilian Early Political Events [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14834433
Explore at:
Dataset updated
Feb 7, 2025
Dataset provided by
Balby Marinho, Leandro
Juvino Santos, Lucas Raniére
Campelo, Claudio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brazil
Description
2022 Brazilian Presidential Election

This dataset contains 13,910,048 tweets from 1,346,340 users, extracted using 157 search terms over 56 different days between January 1st and June 21st, 2023.

All tweets in this dataset are in Brazilian Portuguese.

Data Usage

The dataset contains textual data from tweets, making it suitable for various NLP analyses, such as sentiment analysis, bias or stance detection, and toxic language detection. Additionally, users and tweets can be linked to create social graphs, enabling Social Network Analysis (SNA) to study polarization, communities, and other social dynamics.

Extraction Method

This data set was extracted using Twitter's (now X) official API—when Academic Research API access was still available—following the pipeline:

Twitter/X daily monitoring: The dataset author monitored daily political events appearing in Brazil's Trending Topics. Twitter/X has an automated system for classifying trending terms. When a term was identified as political, it was stored along with its date for later use as a search query. 2. Tweet collection using saved search terms: Once terms and their corresponding dates were recorded, tweets were extracted from 12:00 AM to 11:59 PM on the day the term entered the Trending Topics. A language filter was applied to select only tweets in Portuguese. The extraction was performed using the official Twitter/X API. 3. Data storage: The extracted data was organized by day and search term. If the same search term appeared in Trending Topics on consecutive days, a separate file was stored for each respective day.

Further Information

For more details, visit:

The repository- Dataset short paper:

DOI: 10.5281/zenodo.14834434

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/

Twitter users in the United States 2019-2028

Explore at:

75 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 13, 2024

Dataset provided by

Statistahttp://statista.com/

Authors

Statista Research Department

Area covered

United States

Description

The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

Clear search

Close search

Google apps

Main menu

Twitter users in the United States 2019-2028

Twitter Dataset

A study on real graphs of fake news spreading on Twitter

Master X-Ray Catalog - Dataset - NASA Open Data Portal

Annual Respondents Database X, 1997-2020: Secure Access

Master X-Ray Catalog

Russian Foreign Ministry Twitter Accounts Dataset

Dataset for: A technique for in-situ displacement and strain measurement...

Medical Imaging (CT-Xray) Colorization New Dataset

MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

Usage Notice

Humanitarian Classes

Bias Classes

Information Integrity Classes

Key Notes

#ChatGPT 1000 Daily 🐦 Tweets

License

🤖 Tweets and Reactions on DeepSeek 🐋

◾Banner

◾Background Context

◾A (Very) Good Summary

◾Dataset Contents

◾Important note ⚠️

◾Source

◾Image

Music Streams Labelled with Listening Situation -...

Measurement-based MIMO channel model at 140GHz

Arab Computational Propaganda on X (Twitter)

CSU Synthetic Attribution Benchmark Dataset

Fedivertex

Introduction

Usage

Python

Available graphs

Gephi

TRMM (TMPA-RT) Near Real-Time IR precipitation estimate L3 1-hour 0.25...

Dataset of book subjects that contain The Contax RTS & Yashica SLR book :...

[Tweets] 2023 Brazilian Early Political Events

Twitter users in the United States 2019-2028