12 datasets found

Persian tweets emotional dataset
kaggle.com
Updated Jun 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Behdad Karimi (2021). Persian tweets emotional dataset [Dataset]. https://www.kaggle.com/behdadkarimi/persian-tweets-emotional-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Behdad Karimi
Description
New Persian Dataset

Since Persian datasets are really scarce I scrape Twitter in order to make a new Persian dataset.

The tweets have been pulled from Twitter using snscrape and manual tagging has been done based on Ekman's 6 main emotions. For privacy sake, I pre-process and remove usernames, display names, and mentions from all tweets. Also, I deleted the timestamps and Tweets IDs.

Columns: 1) tweet 2) replyCount 3) retweetCount 4) likeCount 5) quoteCount 6) hashtags 7) sourceLabel 8) emotion

Please leave an upvote if you find this relevant. :)
R
Custom Yolov7 On Kaggle On Custom Dataset
universe.roboflow.com
zip
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Jan 29, 2023
Dataset authored and provided by
Owais Ahmad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Car Bounding Boxes
Description
Custom Training with YOLOv7 🔥

Some Important links

Model Inference🤖

🚀Training Yolov7 on Kaggle

Weight and Biases 🐝

HuggingFace 🤗 Model Repo

Contact Information

Name - Owais Ahmad

Phone - +91-9515884381

Email - owaiskhan9654@gmail.com

Portfolio - https://owaiskhan9654.github.io/

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

Link to the Downloadable Dataset

from IPython.display import Markdown, display display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))

Custom Training with YOLOv7 🔥

In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

Export the dataset to YOLOv7

Train YOLOv7 to recognize the objects in our dataset

Evaluate our YOLOv7 model's performance

Run test inference to view performance of YOLOv7 model at work

📦 YOLOv7

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

Image Credit - jinfagang

Step 1: Install Requirements

!git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements %cd yolov7 !pip install -qr requirements.txt !pip install -q roboflow

Downloading YOLOV7 starting checkpoint

!wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"

import os import glob import wandb import torch from roboflow import Roboflow from kaggle_secrets import UserSecretsClient from IPython.display import Image, clear_output, display # to display images print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

YOLOv7-Car-Person-Custom

try: user_secrets = UserSecretsClient() wandb_api_key = user_secrets.get_secret("wandb_api") wandb.login(key=wandb_api_key) anonymous = None except: wandb.login(anonymous='must') print('To use your W&B account, Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. Get your W&B access token from here: https://wandb.ai/authorize') wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")

Step 2: Assemble Our Dataset

https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

In Roboflow, We can choose between two paths:

Convert an existing Coco dataset to YOLOv7 format. In Roboflow it supports over 30 formats object detection formats for conversion.

Uploading only these raw images and annotate them in Roboflow with Roboflow Annotate.

Version v2 Aug 12, 2022 Looks like this.

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

user_secrets = UserSecretsClient() roboflow_api_key = user_secrets.get_secret("roboflow_api")

rf = Roboflow(api_key=roboflow_api_key) project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq") dataset = project.version(2).download("yolov7")

Step 3: Training Custom pretrained YOLOv7 model

Here, I am able to pass a number of arguments: - img: define input image size - batch: determine
Biological invasions in USA
kaggle.com
Updated Feb 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lázaro (2021). Biological invasions in USA [Dataset]. https://www.kaggle.com/lazaro97/biological-invasions/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2021
Dataset provided by
Kaggle
Authors
Lázaro
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Nonindigenous aquatic species introductions are widely recognized as major stressors to freshwater ecosystems, threatening native endemic biodiversity and causing negative impacts to ecosystem services as well as damaging local and regional economies.

So, it's thus necessary to monitor the spatial and temporal trends and spread in order to guide prevention and control efforts and to develop effective policy aimed at mitigating impacts.

In other way, you can improve your skills in analyze spatial and temporal patterns. For that I recommend reviewing this course first.

Content

This kaggle dataset contains nonindigenous aquatic species introductions from 1616 to 2016 in United States of America.

Two sources were used:

1. DAT_SPECIES: Information of species.

Dataset belongs to the U.S. EPA Office of Research and Development, and can be downloaded on various open data platforms. See data.gov or data.world.

The data provided are merge and a small cleaning is carried out. Its features are:

Occurrence: Unique id

Sciname: This is the scientific name of the specie.

Kingdom: There are two large groups: animals and plants.

Group: Taxonomy group. For only animals.

Family: Taxonomy family. For only animals.

State: The state that the ocurrence was reported in.

NativeRegion: Continet (or region) of origin.

Centroid: Is this known to be a centroid?

DecimalLat: Original latitude provided from source database.

DecimalLon: Original longitude provided from source database.

DateObserved: Date reported that species was observed.

Year: The year that species was observed. Useful if the date is missing.

Collector: The original collector of the specimen.

Recorderby: Name of discover person. Useful if the collector is missing.

HUC8SkM:Area of HUC8 (in square kilometers)

HUC10SkM:Area of HUC10 (in square kilometers)

HUC12SkM:Area of HUC12 (in square kilometers)

2. DAT_SPATIAL: Georeferencied information in USA by state.

Imported and preprocessed from geopandas datasets and a brief web scraping. Its features are:

Name: Contains name of states.

Geometry: For the boundary plot. Shape ot the states.

Iso_code: Codes for the names of the principal subdivisions.

Acknowledgements

Thanks to Kaggle! the platform, its resources and the community in general are great.

Inspiration

It's possible can provide important insights into the historical drivers of any event and aid in forecasting future patterns. Then, if you consider adding spaciotemporal information, the analysis becomes more complete.
Brain Tumor images with tumor location coordinates
kaggle.com
zip
Updated Feb 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gonzalo Recio (2022). Brain Tumor images with tumor location coordinates [Dataset]. https://www.kaggle.com/datasets/gonzalorecioc/brain-tumor-images-with-tumor-location-coordinates
Explore at:
zip(34896050 bytes)Available download formats
Dataset updated
Feb 18, 2022
Authors
Gonzalo Recio
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Context

This is a modified version of Preet Viradiya's dataset "Brian Tumor Dataset", but all the tumor images have been preprocessed, normalized, and the tumor location metadata has been manually gathered into a separate dataset.

The full preprocess sequence is detailed in the first half of this notebook in the original dataset: Brain tumor image preprocessing & clasifier

DISCLAIMER: I am no neuroscientist, so this data should only be used for practice purposes, as some of the tumor data location is bound to be inaccurate or plainly wrong.

Content

The data is split in two datasets: 1. image_df contains 2500 separate 128x128px images of cancer brain scans, one in each row. Reshaping a row into a 128x128 array should be necessary in order to display it correctly. 2. data_df contains four integers per entry detailing the coordinates of the top left corner of the approximate rectangle containing the tumor in the same-index image. The following two values contain the rectangle width and height respectively.

An example of loading and displaying data from this dataset has been included into the notebooks section under the name Dataset Usage Basic Example.

Acknowledgements

Thanks to Preet Viradiya for providing the original images

Inspiration

This dataset's goal is to find a way to improve an hypotetical brain scan classificator. The question of "does this brain have cancer?" has been answered in the original dataset, using regression and this modified dataset not only the classification question question can be answered but also a model can be trained to point out exactly where the tumor is located.
Emissions by Country
kaggle.com
Updated Mar 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2024). Emissions by Country [Dataset]. https://www.kaggle.com/datasets/thedevastator/global-fossil-co2-emissions-by-country-2002-2022
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Emissions by Country

Quantifying Sources and Emission Levels

By [source]

About this dataset

This dataset provides an in-depth look into the global CO2 emissions at the country-level, allowing for a better understanding of how much each country contributes to the global cumulative human impact on climate. It contains information on total emissions as well as from coal, oil, gas, cement production and flaring, and other sources. The data also provides a breakdown of per capita CO2 emission per country - showing which countries are leading in pollution levels and identifying potential areas where reduction efforts should be concentrated. This dataset is essential for anyone who wants to get informed about their own environmental footprint or conduct research on international development trends

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a country-level survey of global fossil CO2 emissions, including total emissions, emissions from coal, oil, gas, cement, flaring and other sources as well as per capita emissions.

For researchers looking to quantify global CO2 emission levels by country over time and understand the sources of these emissions this dataset can be a valuable resource.

The data is organized using the following columns: Country (the name of the country), ISO 3166-1 alpha-3 (the three letter code for the country), Year (the year of survey data), Total (the total amount of CO2 emitted by the country in that year), Coal (amount of CO2 emitted by coal in that year), Oil (amount emitted by oil) , Gas (amount emitted by gas) , Cement( amount emitted by cement) , Flaring(flaring emission levels ) and Other( other forms such as industrial processes ). In addition there is also one extra column Per Capita which provides an insight into how much personal carbon dioxide emission is present in each Country per individual .

To make use of these columns you can aggregate sum up Total column for a specific region or help define how much each source contributes to Total column such as how many percent it accounts for out of 100 or construct dashboard visualizations to explore what sources are responsible for higher level emission across different countries similar clusters or examine whether individual countries Focusing on Flaring — emissions associated with burning off natural gas while drilling—can improve overall Fossil Fuel Carbon Emission profiles better understanding of certain types nuclear power plants etc.

The main purpose behind this dataset was to facilitate government bodies private organizations universities NGO's research agencies alike applying analytical techniques tracking environment changes linked with influence cross regions providing resources needed analyze process monitor developing directed ways managing efficient ways get detailed comprehensive verified information

With insights gleaned from this dataset one can begin identify strategies efforts pollutant mitigation climate change combat etc while making decisions centered around sustainable developments with continent wide unified plans policy implementations keep an eye out evidences regional discrepancies being displayed improving quality life might certainly seem likely assure task easy quickly done “Global Fossil Carbon Dioxide Emissions:Country Level Survey 2002 2022 could exactly what us

Research Ideas

Using the per capita emissions data, develop a reporting system to track countries' progress in meeting carbon emission targets and give policy recommendations for how countries can reach those targets more quickly.

Analyze the correlation between different fossil fuel sources and CO2 emissions to understand how best to reduce CO2 emissions at a country-level.

Create an interactive map showing global CO2 levels over time that allows users to visualize trends by country or region across all fossil fuel sources

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: GCB2022v27_MtCO2_flat.csv | Column name | Description ...
Data from: Open Images
kaggle.com
opendatalab.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Open Images [Dataset]. https://www.kaggle.com/bigquery/open-images
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

Labeled datasets are useful in machine learning research.

Content

This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.

Tables: 1) annotations_bbox 2) dict 3) images 4) labels

Update Frequency: Quarterly

Querying BigQuery Tables

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:open_images

https://cloud.google.com/bigquery/public-data/openimages

APA-style citation: Google Research (2016). The Open Images dataset [Image urls and labels]. Available from github: https://github.com/openimages/dataset.

Use: The annotations are licensed by Google Inc. under CC BY 4.0 license.

The images referenced in the dataset are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

Banner Photo by Mattias Diesel from Unsplash.

Inspiration

Which labels are in the dataset? Which labels have "bus" in their display names? How many images of a trolleybus are in the dataset? What are some landing pages of images with a trolleybus? Which images with cherries are in the training set?
Fashion Products Images
kaggle.com
Updated Feb 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavik Jikadara (2024). Fashion Products Images [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/e-commerce-products-images
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2024
Dataset provided by
Kaggle
Authors
Bhavik Jikadara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

The growing e-commerce industry presents us with a large dataset waiting to be scraped and researched. In addition to professionally shot high-resolution product images, we also have multiple label attributes describing the product which were manually entered while cataloging. To add, we also have descriptive text commenting on the product characteristics.

Content

Each product is identified by an ID like 42431. You will find a map of all the products in styles.csv. From here, you can fetch the image for this product from images/42431.jpg. To get started easily, we have exposed some key product categories and their display names in styles.csv.

Inspiration

So what can you try building? Here are some suggestions: - Start with an image classifier. Use the masterCategory column from styles.csv and train a convolutional neural network. - Try adding more sophisticated classification by predicting the other category labels in styles.csv
Men's Mile Run World Record Progression History
kaggle.com
Updated Jan 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Men's Mile Run World Record Progression History [Dataset]. https://www.kaggle.com/datasets/thedevastator/men-s-mile-run-world-record-progression-history/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Description
Men's Mile Run World Record Progression History (1861-Present)

Examining the Athlete, Nationality and Venue Influence on Race Times

By Ben Jones [source]

About this dataset

This remarkable dataset chronicles the world record progression of the men's mile run, containing detailed information on each athlete's time, their name, nationality, date of their accomplishment and the location of their event. It allows us to look back in history and get a comprehensive overview of how this track event has progressed over time. Analyzing this information can help us understand how training and technology have improved the event over the years; as well as study different athletes' performances and learn how some athletes have pushed beyond their limits or fell short. This valuable resource is an essential source for anyone intrigued by the cutting edge achievements in men's mile running world records. Discovering powerful insights from this dataset can allow us to gain perspective into not only our own personal goals but also uncover ideas on how we could continue pushing our physical boundaries by watching past successes. Explore and comprehend for yourself what it means to be a true athlete at heart!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide provides an introduction on how best to use this dataset in order to analyze various aspects involving the men’s mile run world records. We will focus on analyzing specific fields such as date, athlete name & nationality, time taken for completion and auto status by using statistical methods and graphical displays of data.

In order to use this data effectively it is important that you understand what each field measures: • Time: The amount of time it took for an athlete to finish a race - measured in minutes and seconds (example: 3:54).
• Auto: Whether or not a pacemaker was used during a specific race (example ; yes/no).
• Athlete Name & Nationality: The name and nationality associated with an athlete who set \record(example; Usain Bolt - Jamaica).
• Date : Year representing when a specific record was set by an individual( example-2021 ). •Venue : Location at which the record is set.(example; London Olympic Stadium )

Now that you understand which fields measure what let’s discuss various ways that you can use these datasets features. Analyzing trends in historical sporting performances has long been utilized as means for understanding changes brought about by new training methods/technologies etc., over time . This can be done with our dataset by using basic statistical displays like bar graphs & average analysis or more advanced methods such as regression analysis or even Bayesian approaches etc..The first thing anyone interested should do when dealing with this sort of data is inspect any wacky outliers before beginning more rigorous analysis; if one discovers any potential unreasonable values it would be best to discard them before building after models or readings based off them (this sort of elimination is common practice).After cleaning your work space let’s move onto building interactive visual display through graphics ,plotting different columns against one another e.g., – plotting time against date allows us see changes overtime from 1861 until now . Additionally plotting time vs Auto allows us see any

Research Ideas

Comparing individual athletes and identifying those who have consistently pushed the event to higher levels of performance.

Analyzing national trends related to improvement in track records over time, based on differences in training and technology.

Creating a heatmap to visualize the progression of track records around the world and locate regions with a particularly strong historical performance in this event

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. -...
Be Slavery Free Chocolate 2022
kaggle.com
Updated Oct 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malcolm Gin (2022). Be Slavery Free Chocolate 2022 [Dataset]. https://www.kaggle.com/datasets/malcolmgin/be-slavery-free-chocolate-2022
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 11, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Malcolm Gin
Description
From Be Slavery Free watchdog group. Hand-transcribed into Python data frame in my Enhanced Cacao Data Gathering notebook, and saved out as csv. Data is as presented in the graphical PDF scorecard document.

I encoded the colored bunny and egg values as numbers. 1 = "Leading the industry on policy" 2 = "Starting to implement good policies" 3 = "Needs more work on policy and implementation" 4 = "Needs to catch up with the industry" 0 = "Did not respond to survey; Lacking in transparency"

Note: a single 0 ramified across all ratings columns in my manual transcription of the data set for those that did not respond to the industry survey but were listed on the scorecard with a black egg or bunny.

The SubsidiaryIndustry column would probably be best parsed out (delimiter is '-') and split into unrelated columsn (e.g. "Subsidiary" and "Industry") or even encoded with one-hot encoding (e.g. either generic "Subsidiary1", "Subsidiary2", etc. or specific "Chocolate", "Trader", etc., with binary values.

Imported for use cross referencing with the Flavors of Cacao datasets scraped by the import script or analyzed by various cacao analytics exercises.

The 2021 scorecard is also available (though I haven't personally transcribed it yet).

The latest version adds two more csv files, transformations of the first file provided here. - be_slavery_free_chocolate_normalized.csv takes the weird scale I input from the scale the original scorecard uses (1-6 expressed in green through red plus 0 for missingvalues), and refactors and normalizes them to the 0 to 1 scale used for, for example, the stars() plot in R. - be_slavery_free_chocolate_normalized_split.csv takes the normalized set and splits the SubsidiaryIndustry into a separate row for each "-" delimited value. I also manually went through the resulting data frame to remove duplicates, e.g. for traders/manufacturers/processors.

Both of these data sets can more easily be used with the stars() plotting function and with other older functions that require normalized data. For the stars() function specifically, be sure to use one of the text rows as row names (with row.names(df) = df$Company, for example), since the function implicitly expects to use the row name as the star plot label in a faceted display.

Photo by Ákos Helgert: https://www.pexels.com/photo/yellow-cacao-fruit-8900912/
Agentic_AI_Applications_2025
kaggle.com
Updated May 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hajra Amir (2025). Agentic_AI_Applications_2025 [Dataset]. https://www.kaggle.com/datasets/hajraamir21/agentic-ai-applications-2025
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 10, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hajra Amir
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a comprehensive overview of various Agentic AI (autonomous AI) applications across multiple industries in 2025. It contains detailed records of how AI is being utilized to automate complex tasks, improve efficiency, and generate measurable outcomes. The dataset is designed to help researchers, data scientists, and businesses understand the current state and potential of Agentic AI in different sectors. Dataset Features: Industry: The sector where Agentic AI is applied (e.g., Healthcare, Finance, Manufacturing).

Application Area: The specific task or function performed by the AI agent (e.g., Fraud Detection, Predictive Maintenance).

AI Agent Name: The name of the AI system or agent deployed (e.g., HealthAI Monitor, FinSecure Agent).

Task Description: A brief description of the AI's function or role.

Technology Stack: The technologies powering the AI (e.g., Machine Learning, NLP, Computer Vision).

Outcome Metrics:The measurable impact of the AI deployment (e.g., 30% reduction in ER visits).

Deployment Year: The year the AI system was deployed (ranging from 2023 to 2025).

Geographical Region: The region where the AI application is implemented (e.g., North America, Asia, Europe).
Australian Election 2019 Tweets
kaggle.com
Updated May 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tania J (2019). Australian Election 2019 Tweets [Dataset]. https://www.kaggle.com/taniaj/australian-election-2019-tweets/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 21, 2019
Dataset provided by
Kaggle
Authors
Tania J
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Australia
Description
Context

During the 2019 Australian election I noticed that almost everything I was seeing on Twitter was unusually left-wing. So I decided to scrape some data and investigate. Unfortunately my sentiment analysis has so far been too inaccurate to come to any useful conclusions. I decided to share the data so that others may be able to help with the sentiment or any other interesting analysis.

Content

Over 180,000 tweets collected using Twitter API keyword search between 10.05.2019 and 20.05.2019. Columns are as follows:

created_at: Date and time of tweet creation

id: Unique ID of the tweet

full_text: Full tweet text

retweet_count: Number of retweets

favorite_count: Number of likes

user_id: User ID of tweet creator

user_name: Username of tweet creator

user_screen_name: Screen name of tweet creator

user_description: Description on tweet creator's profile

user_location: Location given on tweet creator's profile

user_created_at: Date the tweet creator joined Twitter

The latitude and longitude of user_location is also available in location_geocode.csv. This information was retrieved using the Google Geocode API.

Acknowledgements

Thanks to Twitter for providing the free API.

Inspiration

There are a lot of interesting things that could be investigated with this data. Primarily I was interested to do sentiment analysis, before and after the election results were known, to determine whether Twitter users are indeed a left-leaning bunch. Did the tweets become more negative as the results were known?

Other ideas for investigation include:

Take into account retweets and favourites to weight overall sentiment analysis.

Which parts of the world are interested (ie: tweet about) the Australian elections, apart from Australia?

How do the users who tweet about this sort of thing tend to describe themselves?

Is there a correlation between when the user joined Twitter and their political views (this assumes the sentiment analysis is already working well)?

Predict gender from username/screen name and segment tweet count and sentiment by gender
World Gender Statistics
kaggle.com
zip
Updated Nov 28, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2016). World Gender Statistics [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-gender-statistics/versions/1
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Nov 28, 2016
Dataset authored and provided by
World Bankhttp://topics.nytimes.com/top/reference/timestopics/organizations/w/world_bank/index.html
Area covered
World
Description
The Gender Statistics database is a comprehensive source for the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.

The Data

The data is split into several files, with the main one being Data.csv. The Data.csv contains all the variables of interest in this dataset, while the others are lists of references and general nation-by-nation information.

Data.csv contains the following fields:

Data.csv

Country.Name: the name of the country

Country.Code: the country's code

Indicator.Name: the name of the variable that this row represents

Indicator.Code: a unique id for the variable

1960 - 2016: one column EACH for the value of the variable in each year it was available

The other files

I couldn't find any metadata for these, and I'm not qualified to guess at what each of the variables mean. I'll list the variables for each file, and if anyone has any suggestions (or, even better, actual knowledge/citations) as to what they mean, please leave a note in the comments and I'll add your info to the data description.

Country-Series.csv

CountryCode

SeriesCode

DESCRIPTION

Country.csv

Country.Code

Short.Name

Table.Name

Long.Name

2-alpha.code

Currency.Unit

Special.Notes

Region

Income.Group

WB-2.code

National.accounts.base.year

National.accounts.reference.year

SNA.price.valuation

Lending.category

Other.groups

System.of.National.Accounts

Alternative.conversion.factor

PPP.survey.year

Balance.of.Payments.Manual.in.use

External.debt.Reporting.status

System.of.trade

Government.Accounting.concept

IMF.data.dissemination.standard

Latest.population.census

Latest.household.survey

Source.of.most.recent.Income.and.expenditure.data

Vital.registration.complete

Latest.agricultural.census

Latest.industrial.data

Latest.trade.data

Latest.water.withdrawal.data

FootNote.csv

CountryCode

SeriesCode

Year

DESCRIPTION

Series-Time.csv

SeriesCode

Year

DESCRIPTION

Series.csv

Series.Code

Topic

Indicator.Name

Short.definition

Long.definition

Unit.of.measure

Periodicity

Base.Period

Other.notes

Aggregation.method

Limitations.and.exceptions

Notes.from.original.source

General.comments

Source

Statistical.concept.and.methodology

Development.relevance

Related.source.links

Other.web.links

Related.indicators

License.Type

Acknowledgements

This dataset was downloaded from The World Bank's Open Data project. The summary of the Terms of Use of this data is as follows:

You are free to copy, distribute, adapt, display or include the data in other products for commercial and noncommercial purposes at no cost subject to certain limitations summarized below.

You must include attribution for the data you use in the manner indicated in the metadata included with the data.

You must not claim or imply that The World Bank endorses your use of the data by or use The World Bank’s logo(s) or trademark(s) in conjunction with such use.

Other parties may have ownership interests in some of the materials contained on The World Bank Web site. For example, we maintain a list of some specific data within the Datasets that you may not redistribute or reuse without first contacting the original content provider, as well as information regarding how to contact the original content provider. Before incorporating any data in other products, please check the list: Terms of use: Restricted Data.

-- [ed. note: this last is not applicable to the Gender Statistics database]

The World Bank makes no warranties with respect to the data and you agree The World Bank shall not be liable to you in connection with your use of the data.

This is only a summary of the Terms of Use for Datasets Listed in The World Bank Data Catalogue. Please read the actual agreement that controls your use of the Datasets, which is available here: Terms of use for datasets. Also see World Bank Terms and Conditions.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Behdad Karimi (2021). Persian tweets emotional dataset [Dataset]. https://www.kaggle.com/behdadkarimi/persian-tweets-emotional-dataset/code

Persian tweets emotional dataset

Persian tweets dataset

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 26, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Behdad Karimi

Description

New Persian Dataset

Since Persian datasets are really scarce I scrape Twitter in order to make a new Persian dataset.

The tweets have been pulled from Twitter using snscrape and manual tagging has been done based on Ekman's 6 main emotions. For privacy sake, I pre-process and remove usernames, display names, and mentions from all tweets. Also, I deleted the timestamps and Tweets IDs.

Columns: 1) tweet 2) replyCount 3) retweetCount 4) likeCount 5) quoteCount 6) hashtags 7) sourceLabel 8) emotion

Please leave an upvote if you find this relevant. :)

Clear search

Close search

Google apps

Main menu

Persian tweets emotional dataset

New Persian Dataset

Custom Yolov7 On Kaggle On Custom Dataset

Custom Training with YOLOv7 🔥

Some Important links

Contact Information

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

Custom Training with YOLOv7 🔥

📦 YOLOv7

Step 1: Install Requirements

Downloading YOLOV7 starting checkpoint

Step 2: Assemble Our Dataset

Version v2 Aug 12, 2022 Looks like this.

Step 3: Training Custom pretrained YOLOv7 model

Biological invasions in USA

Context

Content

Acknowledgements

Inspiration

Brain Tumor images with tumor location coordinates

Context

Content

Acknowledgements

Inspiration

Emissions by Country

Emissions by Country

Quantifying Sources and Emission Levels

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Data from: Open Images

Context

Content

Querying BigQuery Tables

Acknowledgements

Inspiration

Fashion Products Images

Context

Content

Inspiration

Men's Mile Run World Record Progression History

Men's Mile Run World Record Progression History (1861-Present)

Examining the Athlete, Nationality and Venue Influence on Race Times

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Be Slavery Free Chocolate 2022

Agentic_AI_Applications_2025

Australian Election 2019 Tweets

Context

Content

Acknowledgements

Inspiration

World Gender Statistics

The Data

Data.csv

The other files

Acknowledgements

Persian tweets emotional dataset

Persian tweets dataset

New Persian Dataset