12 datasets found
  1. Persian tweets emotional dataset

    • kaggle.com
    Updated Jun 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Behdad Karimi (2021). Persian tweets emotional dataset [Dataset]. https://www.kaggle.com/behdadkarimi/persian-tweets-emotional-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 26, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Behdad Karimi
    Description

    New Persian Dataset

    Since Persian datasets are really scarce I scrape Twitter in order to make a new Persian dataset.

    The tweets have been pulled from Twitter using snscrape and manual tagging has been done based on Ekman's 6 main emotions. For privacy sake, I pre-process and remove usernames, display names, and mentions from all tweets. Also, I deleted the timestamps and Tweets IDs.

    Columns: 1) tweet 2) replyCount 3) retweetCount 4) likeCount 5) quoteCount 6) hashtags 7) sourceLabel 8) emotion

    Please leave an upvote if you find this relevant. :)

  2. R

    Custom Yolov7 On Kaggle On Custom Dataset

    • universe.roboflow.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 29, 2023
    Dataset authored and provided by
    Owais Ahmad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Person Car Bounding Boxes
    Description

    Custom Training with YOLOv7 🔥

    Some Important links

    Contact Information

    Objective

    To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

    Data Acquisition

    The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

    from IPython.display import Markdown, display
    
    display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))
    

    Custom Training with YOLOv7 🔥

    In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

    • Export the dataset to YOLOv7
    • Train YOLOv7 to recognize the objects in our dataset
    • Evaluate our YOLOv7 model's performance
    • Run test inference to view performance of YOLOv7 model at work

    📦 YOLOv7

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

    Image Credit - jinfagang

    Step 1: Install Requirements

    !git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements
    %cd yolov7
    !pip install -qr requirements.txt
    !pip install -q roboflow
    

    Downloading YOLOV7 starting checkpoint

    !wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"
    
    import os
    import glob
    import wandb
    import torch
    from roboflow import Roboflow
    from kaggle_secrets import UserSecretsClient
    from IPython.display import Image, clear_output, display # to display images
    
    
    
    print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")
    

    https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

    I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

    YOLOv7-Car-Person-Custom

    try:
      user_secrets = UserSecretsClient()
      wandb_api_key = user_secrets.get_secret("wandb_api")
      wandb.login(key=wandb_api_key)
      anonymous = None
    except:
      wandb.login(anonymous='must')
      print('To use your W&B account,
    Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. 
    Get your W&B access token from here: https://wandb.ai/authorize')
      
      
      
    wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")
    

    Step 2: Assemble Our Dataset

    https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

    In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

    In Roboflow, We can choose between two paths:

    Version v2 Aug 12, 2022 Looks like this.

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

    user_secrets = UserSecretsClient()
    roboflow_api_key = user_secrets.get_secret("roboflow_api")
    
    rf = Roboflow(api_key=roboflow_api_key)
    project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq")
    dataset = project.version(2).download("yolov7")
    

    Step 3: Training Custom pretrained YOLOv7 model

    Here, I am able to pass a number of arguments: - img: define input image size - batch: determine

  3. Biological invasions in USA

    • kaggle.com
    Updated Feb 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lázaro (2021). Biological invasions in USA [Dataset]. https://www.kaggle.com/lazaro97/biological-invasions/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2021
    Dataset provided by
    Kaggle
    Authors
    Lázaro
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Nonindigenous aquatic species introductions are widely recognized as major stressors to freshwater ecosystems, threatening native endemic biodiversity and causing negative impacts to ecosystem services as well as damaging local and regional economies.

    So, it's thus necessary to monitor the spatial and temporal trends and spread in order to guide prevention and control efforts and to develop effective policy aimed at mitigating impacts.

    In other way, you can improve your skills in analyze spatial and temporal patterns. For that I recommend reviewing this course first.

    Content

    This kaggle dataset contains nonindigenous aquatic species introductions from 1616 to 2016 in United States of America.

    Two sources were used:

    1. DAT_SPECIES: Information of species.

    Dataset belongs to the U.S. EPA Office of Research and Development, and can be downloaded on various open data platforms. See data.gov or data.world.

    The data provided are merge and a small cleaning is carried out. Its features are:

    • Occurrence: Unique id
    • Sciname: This is the scientific name of the specie.
    • Kingdom: There are two large groups: animals and plants.
    • Group: Taxonomy group. For only animals.
    • Family: Taxonomy family. For only animals.
    • State: The state that the ocurrence was reported in.
    • NativeRegion: Continet (or region) of origin.
    • Centroid: Is this known to be a centroid?
    • DecimalLat: Original latitude provided from source database.
    • DecimalLon: Original longitude provided from source database.
    • DateObserved: Date reported that species was observed.
    • Year: The year that species was observed. Useful if the date is missing.
    • Collector: The original collector of the specimen.
    • Recorderby: Name of discover person. Useful if the collector is missing.
    • HUC8SkM:Area of HUC8 (in square kilometers)
    • HUC10SkM:Area of HUC10 (in square kilometers)
    • HUC12SkM:Area of HUC12 (in square kilometers)

    2. DAT_SPATIAL: Georeferencied information in USA by state.

    Imported and preprocessed from geopandas datasets and a brief web scraping. Its features are:

    • Name: Contains name of states.
    • Geometry: For the boundary plot. Shape ot the states.
    • Iso_code: Codes for the names of the principal subdivisions.

    Acknowledgements

    Thanks to Kaggle! the platform, its resources and the community in general are great.

    Inspiration

    It's possible can provide important insights into the historical drivers of any event and aid in forecasting future patterns. Then, if you consider adding spaciotemporal information, the analysis becomes more complete.

  4. Brain Tumor images with tumor location coordinates

    • kaggle.com
    zip
    Updated Feb 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gonzalo Recio (2022). Brain Tumor images with tumor location coordinates [Dataset]. https://www.kaggle.com/datasets/gonzalorecioc/brain-tumor-images-with-tumor-location-coordinates
    Explore at:
    zip(34896050 bytes)Available download formats
    Dataset updated
    Feb 18, 2022
    Authors
    Gonzalo Recio
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    This is a modified version of Preet Viradiya's dataset "Brian Tumor Dataset", but all the tumor images have been preprocessed, normalized, and the tumor location metadata has been manually gathered into a separate dataset.

    The full preprocess sequence is detailed in the first half of this notebook in the original dataset: Brain tumor image preprocessing & clasifier

    DISCLAIMER: I am no neuroscientist, so this data should only be used for practice purposes, as some of the tumor data location is bound to be inaccurate or plainly wrong.

    Content

    The data is split in two datasets: 1. image_df contains 2500 separate 128x128px images of cancer brain scans, one in each row. Reshaping a row into a 128x128 array should be necessary in order to display it correctly. 2. data_df contains four integers per entry detailing the coordinates of the top left corner of the approximate rectangle containing the tumor in the same-index image. The following two values contain the rectangle width and height respectively.

    An example of loading and displaying data from this dataset has been included into the notebooks section under the name Dataset Usage Basic Example.

    Acknowledgements

    Thanks to Preet Viradiya for providing the original images

    Inspiration

    This dataset's goal is to find a way to improve an hypotetical brain scan classificator. The question of "does this brain have cancer?" has been answered in the original dataset, using regression and this modified dataset not only the classification question question can be answered but also a model can be trained to point out exactly where the tumor is located.

  5. Emissions by Country

    • kaggle.com
    Updated Mar 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2024). Emissions by Country [Dataset]. https://www.kaggle.com/datasets/thedevastator/global-fossil-co2-emissions-by-country-2002-2022
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Emissions by Country

    Quantifying Sources and Emission Levels

    By [source]

    About this dataset

    This dataset provides an in-depth look into the global CO2 emissions at the country-level, allowing for a better understanding of how much each country contributes to the global cumulative human impact on climate. It contains information on total emissions as well as from coal, oil, gas, cement production and flaring, and other sources. The data also provides a breakdown of per capita CO2 emission per country - showing which countries are leading in pollution levels and identifying potential areas where reduction efforts should be concentrated. This dataset is essential for anyone who wants to get informed about their own environmental footprint or conduct research on international development trends

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a country-level survey of global fossil CO2 emissions, including total emissions, emissions from coal, oil, gas, cement, flaring and other sources as well as per capita emissions.

    For researchers looking to quantify global CO2 emission levels by country over time and understand the sources of these emissions this dataset can be a valuable resource.

    The data is organized using the following columns: Country (the name of the country), ISO 3166-1 alpha-3 (the three letter code for the country), Year (the year of survey data), Total (the total amount of CO2 emitted by the country in that year), Coal (amount of CO2 emitted by coal in that year), Oil (amount emitted by oil) , Gas (amount emitted by gas) , Cement( amount emitted by cement) , Flaring(flaring emission levels ) and Other( other forms such as industrial processes ). In addition there is also one extra column Per Capita which provides an insight into how much personal carbon dioxide emission is present in each Country per individual .

    To make use of these columns you can aggregate sum up Total column for a specific region or help define how much each source contributes to Total column such as how many percent it accounts for out of 100 or construct dashboard visualizations to explore what sources are responsible for higher level emission across different countries similar clusters or examine whether individual countries Focusing on Flaring — emissions associated with burning off natural gas while drilling—can improve overall Fossil Fuel Carbon Emission profiles better understanding of certain types nuclear power plants etc.

    The main purpose behind this dataset was to facilitate government bodies private organizations universities NGO's research agencies alike applying analytical techniques tracking environment changes linked with influence cross regions providing resources needed analyze process monitor developing directed ways managing efficient ways get detailed comprehensive verified information

    With insights gleaned from this dataset one can begin identify strategies efforts pollutant mitigation climate change combat etc while making decisions centered around sustainable developments with continent wide unified plans policy implementations keep an eye out evidences regional discrepancies being displayed improving quality life might certainly seem likely assure task easy quickly done “Global Fossil Carbon Dioxide Emissions:Country Level Survey 2002 2022 could exactly what us

    Research Ideas

    • Using the per capita emissions data, develop a reporting system to track countries' progress in meeting carbon emission targets and give policy recommendations for how countries can reach those targets more quickly.
    • Analyze the correlation between different fossil fuel sources and CO2 emissions to understand how best to reduce CO2 emissions at a country-level.
    • Create an interactive map showing global CO2 levels over time that allows users to visualize trends by country or region across all fossil fuel sources

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: GCB2022v27_MtCO2_flat.csv | Column name | Description ...

  6. Data from: Open Images

    • kaggle.com
    • opendatalab.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Open Images [Dataset]. https://www.kaggle.com/bigquery/open-images
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Labeled datasets are useful in machine learning research.

    Content

    This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.

    Tables: 1) annotations_bbox 2) dict 3) images 4) labels

    Update Frequency: Quarterly

    Querying BigQuery Tables

    Fork this kernel to get started.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:open_images

    https://cloud.google.com/bigquery/public-data/openimages

    APA-style citation: Google Research (2016). The Open Images dataset [Image urls and labels]. Available from github: https://github.com/openimages/dataset.

    Use: The annotations are licensed by Google Inc. under CC BY 4.0 license.

    The images referenced in the dataset are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

    Banner Photo by Mattias Diesel from Unsplash.

    Inspiration

    Which labels are in the dataset? Which labels have "bus" in their display names? How many images of a trolleybus are in the dataset? What are some landing pages of images with a trolleybus? Which images with cherries are in the training set?

  7. Fashion Products Images

    • kaggle.com
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Jikadara (2024). Fashion Products Images [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/e-commerce-products-images
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    Kaggle
    Authors
    Bhavik Jikadara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    • The growing e-commerce industry presents us with a large dataset waiting to be scraped and researched. In addition to professionally shot high-resolution product images, we also have multiple label attributes describing the product which were manually entered while cataloging. To add, we also have descriptive text commenting on the product characteristics.

    Content

    • Each product is identified by an ID like 42431. You will find a map of all the products in styles.csv. From here, you can fetch the image for this product from images/42431.jpg. To get started easily, we have exposed some key product categories and their display names in styles.csv.

    Inspiration

    So what can you try building? Here are some suggestions: - Start with an image classifier. Use the masterCategory column from styles.csv and train a convolutional neural network. - Try adding more sophisticated classification by predicting the other category labels in styles.csv

  8. Men's Mile Run World Record Progression History

    • kaggle.com
    Updated Jan 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Men's Mile Run World Record Progression History [Dataset]. https://www.kaggle.com/datasets/thedevastator/men-s-mile-run-world-record-progression-history/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Men's Mile Run World Record Progression History (1861-Present)

    Examining the Athlete, Nationality and Venue Influence on Race Times

    By Ben Jones [source]

    About this dataset

    This remarkable dataset chronicles the world record progression of the men's mile run, containing detailed information on each athlete's time, their name, nationality, date of their accomplishment and the location of their event. It allows us to look back in history and get a comprehensive overview of how this track event has progressed over time. Analyzing this information can help us understand how training and technology have improved the event over the years; as well as study different athletes' performances and learn how some athletes have pushed beyond their limits or fell short. This valuable resource is an essential source for anyone intrigued by the cutting edge achievements in men's mile running world records. Discovering powerful insights from this dataset can allow us to gain perspective into not only our own personal goals but also uncover ideas on how we could continue pushing our physical boundaries by watching past successes. Explore and comprehend for yourself what it means to be a true athlete at heart!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide provides an introduction on how best to use this dataset in order to analyze various aspects involving the men’s mile run world records. We will focus on analyzing specific fields such as date, athlete name & nationality, time taken for completion and auto status by using statistical methods and graphical displays of data.

    In order to use this data effectively it is important that you understand what each field measures: • Time: The amount of time it took for an athlete to finish a race - measured in minutes and seconds (example: 3:54).
    • Auto: Whether or not a pacemaker was used during a specific race (example ; yes/no).
    • Athlete Name & Nationality: The name and nationality associated with an athlete who set \record(example; Usain Bolt - Jamaica).
    • Date : Year representing when a specific record was set by an individual( example-2021 ). •Venue : Location at which the record is set.(example; London Olympic Stadium )

    Now that you understand which fields measure what let’s discuss various ways that you can use these datasets features. Analyzing trends in historical sporting performances has long been utilized as means for understanding changes brought about by new training methods/technologies etc., over time . This can be done with our dataset by using basic statistical displays like bar graphs & average analysis or more advanced methods such as regression analysis or even Bayesian approaches etc..The first thing anyone interested should do when dealing with this sort of data is inspect any wacky outliers before beginning more rigorous analysis; if one discovers any potential unreasonable values it would be best to discard them before building after models or readings based off them (this sort of elimination is common practice).After cleaning your work space let’s move onto building interactive visual display through graphics ,plotting different columns against one another e.g., – plotting time against date allows us see changes overtime from 1861 until now . Additionally plotting time vs Auto allows us see any

    Research Ideas

    • Comparing individual athletes and identifying those who have consistently pushed the event to higher levels of performance.
    • Analyzing national trends related to improvement in track records over time, based on differences in training and technology.
    • Creating a heatmap to visualize the progression of track records around the world and locate regions with a particularly strong historical performance in this event

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. -...

  9. Be Slavery Free Chocolate 2022

    • kaggle.com
    Updated Oct 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malcolm Gin (2022). Be Slavery Free Chocolate 2022 [Dataset]. https://www.kaggle.com/datasets/malcolmgin/be-slavery-free-chocolate-2022
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 11, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Malcolm Gin
    Description

    From Be Slavery Free watchdog group. Hand-transcribed into Python data frame in my Enhanced Cacao Data Gathering notebook, and saved out as csv. Data is as presented in the graphical PDF scorecard document.

    I encoded the colored bunny and egg values as numbers. 1 = "Leading the industry on policy" 2 = "Starting to implement good policies" 3 = "Needs more work on policy and implementation" 4 = "Needs to catch up with the industry" 0 = "Did not respond to survey; Lacking in transparency"

    Note: a single 0 ramified across all ratings columns in my manual transcription of the data set for those that did not respond to the industry survey but were listed on the scorecard with a black egg or bunny.

    The SubsidiaryIndustry column would probably be best parsed out (delimiter is '-') and split into unrelated columsn (e.g. "Subsidiary" and "Industry") or even encoded with one-hot encoding (e.g. either generic "Subsidiary1", "Subsidiary2", etc. or specific "Chocolate", "Trader", etc., with binary values.

    Imported for use cross referencing with the Flavors of Cacao datasets scraped by the import script or analyzed by various cacao analytics exercises.

    The 2021 scorecard is also available (though I haven't personally transcribed it yet).

    The latest version adds two more csv files, transformations of the first file provided here. - be_slavery_free_chocolate_normalized.csv takes the weird scale I input from the scale the original scorecard uses (1-6 expressed in green through red plus 0 for missingvalues), and refactors and normalizes them to the 0 to 1 scale used for, for example, the stars() plot in R. - be_slavery_free_chocolate_normalized_split.csv takes the normalized set and splits the SubsidiaryIndustry into a separate row for each "-" delimited value. I also manually went through the resulting data frame to remove duplicates, e.g. for traders/manufacturers/processors.

    Both of these data sets can more easily be used with the stars() plotting function and with other older functions that require normalized data. For the stars() function specifically, be sure to use one of the text rows as row names (with row.names(df) = df$Company, for example), since the function implicitly expects to use the row name as the star plot label in a faceted display.

    Photo by Ákos Helgert: https://www.pexels.com/photo/yellow-cacao-fruit-8900912/

  10. Agentic_AI_Applications_2025

    • kaggle.com
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hajra Amir (2025). Agentic_AI_Applications_2025 [Dataset]. https://www.kaggle.com/datasets/hajraamir21/agentic-ai-applications-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hajra Amir
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a comprehensive overview of various Agentic AI (autonomous AI) applications across multiple industries in 2025. It contains detailed records of how AI is being utilized to automate complex tasks, improve efficiency, and generate measurable outcomes. The dataset is designed to help researchers, data scientists, and businesses understand the current state and potential of Agentic AI in different sectors. Dataset Features: Industry: The sector where Agentic AI is applied (e.g., Healthcare, Finance, Manufacturing).

    Application Area: The specific task or function performed by the AI agent (e.g., Fraud Detection, Predictive Maintenance).

    AI Agent Name: The name of the AI system or agent deployed (e.g., HealthAI Monitor, FinSecure Agent).

    Task Description: A brief description of the AI's function or role.

    Technology Stack: The technologies powering the AI (e.g., Machine Learning, NLP, Computer Vision).

    Outcome Metrics:The measurable impact of the AI deployment (e.g., 30% reduction in ER visits).

    Deployment Year: The year the AI system was deployed (ranging from 2023 to 2025).

    Geographical Region: The region where the AI application is implemented (e.g., North America, Asia, Europe).

  11. Australian Election 2019 Tweets

    • kaggle.com
    Updated May 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tania J (2019). Australian Election 2019 Tweets [Dataset]. https://www.kaggle.com/taniaj/australian-election-2019-tweets/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 21, 2019
    Dataset provided by
    Kaggle
    Authors
    Tania J
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Australia
    Description

    Context

    During the 2019 Australian election I noticed that almost everything I was seeing on Twitter was unusually left-wing. So I decided to scrape some data and investigate. Unfortunately my sentiment analysis has so far been too inaccurate to come to any useful conclusions. I decided to share the data so that others may be able to help with the sentiment or any other interesting analysis.

    Content

    Over 180,000 tweets collected using Twitter API keyword search between 10.05.2019 and 20.05.2019. Columns are as follows:

    • created_at: Date and time of tweet creation
    • id: Unique ID of the tweet
    • full_text: Full tweet text
    • retweet_count: Number of retweets
    • favorite_count: Number of likes
    • user_id: User ID of tweet creator
    • user_name: Username of tweet creator
    • user_screen_name: Screen name of tweet creator
    • user_description: Description on tweet creator's profile
    • user_location: Location given on tweet creator's profile
    • user_created_at: Date the tweet creator joined Twitter

    The latitude and longitude of user_location is also available in location_geocode.csv. This information was retrieved using the Google Geocode API.

    Acknowledgements

    Thanks to Twitter for providing the free API.

    Inspiration

    There are a lot of interesting things that could be investigated with this data. Primarily I was interested to do sentiment analysis, before and after the election results were known, to determine whether Twitter users are indeed a left-leaning bunch. Did the tweets become more negative as the results were known?

    Other ideas for investigation include:

    • Take into account retweets and favourites to weight overall sentiment analysis.

    • Which parts of the world are interested (ie: tweet about) the Australian elections, apart from Australia?

    • How do the users who tweet about this sort of thing tend to describe themselves?

    • Is there a correlation between when the user joined Twitter and their political views (this assumes the sentiment analysis is already working well)?

    • Predict gender from username/screen name and segment tweet count and sentiment by gender

  12. World Gender Statistics

    • kaggle.com
    zip
    Updated Nov 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2016). World Gender Statistics [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-gender-statistics/versions/1
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Nov 28, 2016
    Dataset authored and provided by
    World Bankhttp://topics.nytimes.com/top/reference/timestopics/organizations/w/world_bank/index.html
    Area covered
    World
    Description

    The Gender Statistics database is a comprehensive source for the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.

    The Data

    The data is split into several files, with the main one being Data.csv. The Data.csv contains all the variables of interest in this dataset, while the others are lists of references and general nation-by-nation information.

    Data.csv contains the following fields:

    Data.csv

    • Country.Name: the name of the country
    • Country.Code: the country's code
    • Indicator.Name: the name of the variable that this row represents
    • Indicator.Code: a unique id for the variable
    • 1960 - 2016: one column EACH for the value of the variable in each year it was available

    The other files

    I couldn't find any metadata for these, and I'm not qualified to guess at what each of the variables mean. I'll list the variables for each file, and if anyone has any suggestions (or, even better, actual knowledge/citations) as to what they mean, please leave a note in the comments and I'll add your info to the data description.

    Country-Series.csv

    • CountryCode
    • SeriesCode
    • DESCRIPTION

    Country.csv

    • Country.Code
    • Short.Name
    • Table.Name
    • Long.Name
    • 2-alpha.code
    • Currency.Unit
    • Special.Notes
    • Region
    • Income.Group
    • WB-2.code
    • National.accounts.base.year
    • National.accounts.reference.year
    • SNA.price.valuation
    • Lending.category
    • Other.groups
    • System.of.National.Accounts
    • Alternative.conversion.factor
    • PPP.survey.year
    • Balance.of.Payments.Manual.in.use
    • External.debt.Reporting.status
    • System.of.trade
    • Government.Accounting.concept
    • IMF.data.dissemination.standard
    • Latest.population.census
    • Latest.household.survey
    • Source.of.most.recent.Income.and.expenditure.data
    • Vital.registration.complete
    • Latest.agricultural.census
    • Latest.industrial.data
    • Latest.trade.data
    • Latest.water.withdrawal.data

    FootNote.csv

    • CountryCode
    • SeriesCode
    • Year
    • DESCRIPTION

    Series-Time.csv

    • SeriesCode
    • Year
    • DESCRIPTION

    Series.csv

    • Series.Code
    • Topic
    • Indicator.Name
    • Short.definition
    • Long.definition
    • Unit.of.measure
    • Periodicity
    • Base.Period
    • Other.notes
    • Aggregation.method
    • Limitations.and.exceptions
    • Notes.from.original.source
    • General.comments
    • Source
    • Statistical.concept.and.methodology
    • Development.relevance
    • Related.source.links
    • Other.web.links
    • Related.indicators
    • License.Type

    Acknowledgements

    This dataset was downloaded from The World Bank's Open Data project. The summary of the Terms of Use of this data is as follows:

    • You are free to copy, distribute, adapt, display or include the data in other products for commercial and noncommercial purposes at no cost subject to certain limitations summarized below.

    • You must include attribution for the data you use in the manner indicated in the metadata included with the data.

    • You must not claim or imply that The World Bank endorses your use of the data by or use The World Bank’s logo(s) or trademark(s) in conjunction with such use.

    • Other parties may have ownership interests in some of the materials contained on The World Bank Web site. For example, we maintain a list of some specific data within the Datasets that you may not redistribute or reuse without first contacting the original content provider, as well as information regarding how to contact the original content provider. Before incorporating any data in other products, please check the list: Terms of use: Restricted Data.

    -- [ed. note: this last is not applicable to the Gender Statistics database]

    • The World Bank makes no warranties with respect to the data and you agree The World Bank shall not be liable to you in connection with your use of the data.

    • This is only a summary of the Terms of Use for Datasets Listed in The World Bank Data Catalogue. Please read the actual agreement that controls your use of the Datasets, which is available here: Terms of use for datasets. Also see World Bank Terms and Conditions.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Behdad Karimi (2021). Persian tweets emotional dataset [Dataset]. https://www.kaggle.com/behdadkarimi/persian-tweets-emotional-dataset/code
Organization logo

Persian tweets emotional dataset

Persian tweets dataset

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Behdad Karimi
Description

New Persian Dataset

Since Persian datasets are really scarce I scrape Twitter in order to make a new Persian dataset.

The tweets have been pulled from Twitter using snscrape and manual tagging has been done based on Ekman's 6 main emotions. For privacy sake, I pre-process and remove usernames, display names, and mentions from all tweets. Also, I deleted the timestamps and Tweets IDs.

Columns: 1) tweet 2) replyCount 3) retweetCount 4) likeCount 5) quoteCount 6) hashtags 7) sourceLabel 8) emotion

Please leave an upvote if you find this relevant. :)

Search
Clear search
Close search
Google apps
Main menu