72 datasets found
  1. 📣 Ad Click Prediction Dataset

    • kaggle.com
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ciobanu Marius (2024). 📣 Ad Click Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/marius2303/ad-click-prediction-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ciobanu Marius
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    About

    This dataset provides insights into user behavior and online advertising, specifically focusing on predicting whether a user will click on an online advertisement. It contains user demographic information, browsing habits, and details related to the display of the advertisement. This dataset is ideal for building binary classification models to predict user interactions with online ads.

    Features

    • id: Unique identifier for each user.
    • full_name: User's name formatted as "UserX" for anonymity.
    • age: Age of the user (ranging from 18 to 64 years).
    • gender: The gender of the user (categorized as Male, Female, or Non-Binary).
    • device_type: The type of device used by the user when viewing the ad (Mobile, Desktop, Tablet).
    • ad_position: The position of the ad on the webpage (Top, Side, Bottom).
    • browsing_history: The user's browsing activity prior to seeing the ad (Shopping, News, Entertainment, Education, Social Media).
    • time_of_day: The time when the user viewed the ad (Morning, Afternoon, Evening, Night).
    • click: The target label indicating whether the user clicked on the ad (1 for a click, 0 for no click).

    Goal

    The objective of this dataset is to predict whether a user will click on an online ad based on their demographics, browsing behavior, the context of the ad's display, and the time of day. You will need to clean the data, understand it and then apply machine learning models to predict and evaluate data. It is a really challenging request for this kind of data. This data can be used to improve ad targeting strategies, optimize ad placement, and better understand user interaction with online advertisements.

  2. Context Ad Clicks Dataset

    • kaggle.com
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Möbius (2021). Context Ad Clicks Dataset [Dataset]. https://www.kaggle.com/arashnic/ctrtest/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Möbius
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The dataset generated by an E-commerce website which sells a variety of products at its online platform. The records user behaviour of its customers and stores it as a log. However, most of the times, users do not buy the products instantly and there is a time gap during which the customer might surf the internet and maybe visit competitor websites. Now, to improve sales of products, website owner has hired an Adtech company which built a system such that ads are being shown for owner products on its partner websites. If a user comes to owner website and searches for a product, and then visits these partner websites or apps, his/her previously viewed items or their similar items are shown on as an ad. If the user clicks this ad, he/she will be redirected to the owner website and might buy the product.

    The task is to predict the probability i.e. probability of user clicking the ad which is shown to them on the partner websites for the next 7 days on the basis of historical view log data, ad impression data and user data.

    Content

    You are provided with the view log of users (2018/10/15 - 2018/12/11) and the product description collected from the owner website. We also provide the training data and test data containing details for ad impressions at the partner websites(Train + Test). Train data contains the impression logs during 2018/11/15 – 2018/12/13 along with the label which specifies whether the ad is clicked or not. Your model will be evaluated on the test data which have impression logs during 2018/12/12 – 2018/12/18 without the labels. You are provided with the following files:

    • train.zip: This contains 3 files and description of each is given below:
    • train.csv
    • view_log.csv
    • item_data.csv

      • test.csv: test file contains the impressions for which the participants need to predict the click rate sample_submission.csv: This file contains the format in which you have to submit your predictions.

    Inspiration

    • Predict the probability probability of user clicking the ad which is shown to them on the partner websites for the next 7 days on the basis of historical view log data, ad impression data and user data.

    The evaluated metric could be "area under the ROC curve" between the predicted probability and the observed target.

  3. e

    OGD Portal: Daily usage by record (since January 2024)

    • data.europa.eu
    csv, excel xls, json +5
    Updated Apr 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kanton-basel-landschaft (2025). OGD Portal: Daily usage by record (since January 2024) [Dataset]. https://data.europa.eu/data/datasets/12610-kanton-basel-landschaft?locale=en
    Explore at:
    n3, rdf xml, csv, json-ld, json, rdf turtle, parquet, excel xlsAvailable download formats
    Dataset updated
    Apr 6, 2025
    Dataset authored and provided by
    kanton-basel-landschaft
    License

    http://dcat-ap.ch/vocabulary/licenses/terms_byhttp://dcat-ap.ch/vocabulary/licenses/terms_by

    Description

    The data on the use of the data sets on the OGD portal BL (data.bl.ch) are collected and published by the specialist and coordination office OGD BL. Contains the day the usage was measured.dataset_title: The title of the dataset_id record: The technical ID of the dataset.visitors: Specifies the number of daily visitors to the record. Visitors are recorded by counting the unique IP addresses that recorded access on the day of the survey. The IP address represents the network address of the device from which the portal was accessed.interactions: Includes all interactions with any record on data.bl.ch. A visitor can trigger multiple interactions. Interactions include clicks on the website (searching datasets, filters, etc.) as well as API calls (downloading a dataset as a JSON file, etc.).RemarksOnly calls to publicly available datasets are shown.IP addresses and interactions of users with a login of the Canton of Basel-Landschaft - in particular of employees of the specialist and coordination office OGD - are removed from the dataset before publication and therefore not shown.Calls from actors that are clearly identifiable as bots by the user agent header are also not shown.Combinations of dataset and date for which no use occurred (Visitors == 0 & Interactions == 0) are not shown.Due to synchronization problems, data may be missing by the day.

  4. Data from: Google Analytics & Twitter dataset from a movies, TV series and...

    • figshare.com
    • portalcientificovalencia.univeuropea.com
    txt
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Yeste (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. http://doi.org/10.6084/m9.figshare.16553061.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Víctor Yeste
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio

  5. d

    TagX Web Browsing clickstream Data - 300K Users North America, EU - GDPR -...

    • datarade.ai
    .json, .csv, .xls
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TagX (2024). TagX Web Browsing clickstream Data - 300K Users North America, EU - GDPR - CCPA Compliant [Dataset]. https://datarade.ai/data-products/tagx-web-browsing-clickstream-data-300k-users-north-america-tagx
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 16, 2024
    Dataset authored and provided by
    TagX
    Area covered
    United States
    Description

    TagX Web Browsing Clickstream Data: Unveiling Digital Behavior Across North America and EU Unique Insights into Online User Behavior TagX Web Browsing clickstream Data offers an unparalleled window into the digital lives of 1 million users across North America and the European Union. This comprehensive dataset stands out in the market due to its breadth, depth, and stringent compliance with data protection regulations. What Makes Our Data Unique?

    Extensive Geographic Coverage: Spanning two major markets, our data provides a holistic view of web browsing patterns in developed economies. Large User Base: With 300K active users, our dataset offers statistically significant insights across various demographics and user segments. GDPR and CCPA Compliance: We prioritize user privacy and data protection, ensuring that our data collection and processing methods adhere to the strictest regulatory standards. Real-time Updates: Our clickstream data is continuously refreshed, providing up-to-the-minute insights into evolving online trends and user behaviors. Granular Data Points: We capture a wide array of metrics, including time spent on websites, click patterns, search queries, and user journey flows.

    Data Sourcing: Ethical and Transparent Our web browsing clickstream data is sourced through a network of partnered websites and applications. Users explicitly opt-in to data collection, ensuring transparency and consent. We employ advanced anonymization techniques to protect individual privacy while maintaining the integrity and value of the aggregated data. Key aspects of our data sourcing process include:

    Voluntary user participation through clear opt-in mechanisms Regular audits of data collection methods to ensure ongoing compliance Collaboration with privacy experts to implement best practices in data anonymization Continuous monitoring of regulatory landscapes to adapt our processes as needed

    Primary Use Cases and Verticals TagX Web Browsing clickstream Data serves a multitude of industries and use cases, including but not limited to:

    Digital Marketing and Advertising:

    Audience segmentation and targeting Campaign performance optimization Competitor analysis and benchmarking

    E-commerce and Retail:

    Customer journey mapping Product recommendation enhancements Cart abandonment analysis

    Media and Entertainment:

    Content consumption trends Audience engagement metrics Cross-platform user behavior analysis

    Financial Services:

    Risk assessment based on online behavior Fraud detection through anomaly identification Investment trend analysis

    Technology and Software:

    User experience optimization Feature adoption tracking Competitive intelligence

    Market Research and Consulting:

    Consumer behavior studies Industry trend analysis Digital transformation strategies

    Integration with Broader Data Offering TagX Web Browsing clickstream Data is a cornerstone of our comprehensive digital intelligence suite. It seamlessly integrates with our other data products to provide a 360-degree view of online user behavior:

    Social Media Engagement Data: Combine clickstream insights with social media interactions for a holistic understanding of digital footprints. Mobile App Usage Data: Cross-reference web browsing patterns with mobile app usage to map the complete digital journey. Purchase Intent Signals: Enrich clickstream data with purchase intent indicators to power predictive analytics and targeted marketing efforts. Demographic Overlays: Enhance web browsing data with demographic information for more precise audience segmentation and targeting.

    By leveraging these complementary datasets, businesses can unlock deeper insights and drive more impactful strategies across their digital initiatives. Data Quality and Scale We pride ourselves on delivering high-quality, reliable data at scale:

    Rigorous Data Cleaning: Advanced algorithms filter out bot traffic, VPNs, and other non-human interactions. Regular Quality Checks: Our data science team conducts ongoing audits to ensure data accuracy and consistency. Scalable Infrastructure: Our robust data processing pipeline can handle billions of daily events, ensuring comprehensive coverage. Historical Data Availability: Access up to 24 months of historical data for trend analysis and longitudinal studies. Customizable Data Feeds: Tailor the data delivery to your specific needs, from raw clickstream events to aggregated insights.

    Empowering Data-Driven Decision Making In today's digital-first world, understanding online user behavior is crucial for businesses across all sectors. TagX Web Browsing clickstream Data empowers organizations to make informed decisions, optimize their digital strategies, and stay ahead of the competition. Whether you're a marketer looking to refine your targeting, a product manager seeking to enhance user experience, or a researcher exploring digital trends, our cli...

  6. o

    NLP Expert QA Dataset

    • opendatabay.com
    .undefined
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). NLP Expert QA Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/c030902d-7b02-48a2-b32f-8f7140dd1de7
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset, QASPER: NLP Questions and Evidence, is an exceptional collection of over 5,000 questions and answers focused on Natural Language Processing (NLP) papers. It has been crowdsourced from experienced NLP practitioners, with each question meticulously crafted based solely on the titles and abstracts of the respective papers. The answers provided are expertly enriched with evidence taken directly from the full text of each paper. QASPER features structured fields including 'qas' for questions and answers, 'evidence' for supporting information, paper titles, abstracts, figures and tables, and full text. This makes it a valuable resource for researchers aiming to understand how practitioners interpret NLP topics and to validate solutions for problems found in existing literature. The dataset contains 5,049 questions spanning 1,585 distinct papers.

    Columns

    • title: The title of the paper. (String)
    • abstract: A summary of the paper. (String)
    • full_text: The full text of the paper. (String)
    • qas: Questions and answers about the paper. (Object)
    • figures_and_tables: Figures and tables from the paper. (Object)
    • id: Unique identifier for the paper.

    Distribution

    The QASPER dataset comprises 5,049 questions across 1,585 papers. It is distributed across five files in .csv format, with one additional .json file for figures and tables. These include two test datasets (test.csv and validation.csv), two train datasets (train-v2-0_lessons_only_.csv and trainv2-0_unsplit.csv), and a figures dataset (figures_and_tables_.json). Each CSV file contains distinct datasets with columns dedicated to titles, abstracts, full texts, and Q&A fields, along with evidence for each paper mentioned in the respective rows.

    Usage

    This dataset is ideal for various applications, including: * Developing AI models to automatically generate questions and answers from paper titles and abstracts. * Enhancing machine learning algorithms by combining answers with evidence to discover relationships between papers. * Creating online forums for NLP practitioners, using dataset questions to spark discussion within the community. * Conducting basic descriptive statistics or advanced predictive analytics, such as logistic regression or naive Bayes models. * Summarising basic crosstabs between any two variables, like titles and abstracts. * Correlating title lengths with the number of words in their corresponding abstracts to identify patterns. * Utilising text mining technologies like topic modelling, machine learning techniques, or automated processes to summarise underlying patterns. * Filtering terms relevant to specific research hypotheses and processing them via web crawlers, search engines, or document similarity algorithms.

    Coverage

    The dataset has a GLOBAL region scope. It focuses on papers within the field of Natural Language Processing. The questions and answers are crowdsourced from experienced NLP practitioners. The dataset was listed on 22/06/2025.

    License

    CC0

    Who Can Use It

    This dataset is highly suitable for: * Researchers seeking insights into how NLP practitioners interpret complex topics. * Those requiring effective validation for developing clear-cut solutions to problems encountered in existing NLP literature. * NLP practitioners looking for a resource to stimulate discussions within their community. * Data scientists and analysts interested in exploring NLP datasets through descriptive statistics or advanced predictive analytics. * Developers and researchers working with text mining, machine learning techniques, or automated text processing.

    Dataset Name Suggestions

    • NLP Expert QA Dataset
    • QASPER: NLP Paper Questions and Evidence
    • Academic NLP Q&A Corpus
    • Natural Language Processing Research Questions

    Attributes

    Original Data Source: QASPER: NLP Questions and Evidence

  7. N

    Agency Voter Registration Activity

    • data.cityofnewyork.us
    • catalog.data.gov
    application/rdfxml +5
    Updated Sep 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayor's Office of Operations (OPS) (2024). Agency Voter Registration Activity [Dataset]. https://data.cityofnewyork.us/w/kkum-y97z/25te-f2tw?cur=8CErOa7PUs4
    Explore at:
    csv, xml, tsv, application/rdfxml, application/rssxml, jsonAvailable download formats
    Dataset updated
    Sep 4, 2024
    Dataset authored and provided by
    Mayor's Office of Operations (OPS)
    Description

    This dataset captures how many voter registration applications each agency has distributed, how many applications agency staff sent to the Board of Elections, how many staff each agency trained to distribute voter registration applications, whether or not the agency hosts a link to voting.nyc on its website and if so, how many clicks that link received during the reporting period.

  8. A

    ‘Climate Change Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Dec 13, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Climate Change Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-climate-change-dataset-7e65/4a67af59/?iid=002-150&v=presentation
    Explore at:
    Dataset updated
    Dec 13, 2018
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Climate Change Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/climate-change-datae on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Data from World Development Indicators and Climate Change Knowledge Portal on climate systems, exposure to climate impacts, resilience, greenhouse gas emissions, and energy use.

    In addition to the data available here and through the Climate Data API, the Climate Change Knowledge Portal has a web interface to a collection of water indicators that may be used to assess the impact of climate change across over 8,000 water basins worldwide. You may use the web interface to download the data for any of these basins.

    Here is how to navigate to the water data:

    • Go to the Climate Change Knowledge Portal home page (http://climateknowledgeportal.worldbank.org/)
    • Click any region on the map Click a country In the navigation menu
    • Click "Impacts" and then "Water" Click the map to select a specific water basin
    • Click "Click here to get access to data and indicators" Please be sure to observe the disclaimers on the website regarding uncertainties and use of the water data.

    Attribution: Climate Change Data, World Bank Group.

    World Bank Data Catalog Terms of Use

    Source: http://data.worldbank.org/data-catalog/climate-change

    This dataset was created by World Bank and contains around 10000 samples along with 2009, 1993, technical information and other features such as: - 1994 - Series Code - and more.

    How to use this dataset

    • Analyze 1995 in relation to Scale
    • Study the influence of 1998 on Country Code
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit World Bank

    --- Original source retains full ownership of the source dataset ---

  9. March Madness Historical DataSet (2002 to 2025)

    • kaggle.com
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jonathan Pilafas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

    This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

    Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

    These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

    This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.

  10. Job Offers Web Scraping Search

    • kaggle.com
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Offers Web Scraping Search

    Targeted Results to Find the Optimal Work Solution

    By [source]

    About this dataset

    This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

    • Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

    • Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

    • Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

    • Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

      All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

    Research Ideas

    • Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
    • The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
    • It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

  11. Worldwide Soundscapes project meta-data

    • zenodo.org
    Updated Dec 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海 李; 松海 李; 黎君 董; 黎君 董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song (2022). Worldwide Soundscapes project meta-data [Dataset]. http://doi.org/10.5281/zenodo.7415473
    Explore at:
    Dataset updated
    Dec 9, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海 李; 松海 李; 黎君 董; 黎君 董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated soundscape datasets. This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description.

    The overview of all sampling sites can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. More information on the project can be found here and on ResearchGate.

    The audio recording criteria justifying inclusion into the meta-database are:

    • Stationary (no transects, towed sensors or microphones mounted on cars)
    • Passive (unattended, no human disturbance by the recordist)
    • Ambient (no spatial or temporal focus on a particular species or direction)
    • Spatially and/or temporally replicated (multiple sites sampled at least at one common daytime or multiple days sampled at least in one common site)

    The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database.

    datasets

    • dataset_id: incremental integer, primary key
    • name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.
    • subset: incremental integer that can be used to distinguish datasets with identical names
    • collaborators: full names of people deemed responsible for the dataset, separated by commas
    • contributors: full names of people who are not the main collaborators but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses, separated by commas.
    • date_added: when the datased was added (DD/MM/YYYY)
    • URL_open_recordings: if recordings (even only some) from this dataset are openly available, indicate the internet link where they can be found.
    • URL_project: internet link for further information about the corresponding project
    • DOI_publication: DOI of corresponding publications, separated by comma
    • core_realm_IUCN: The core realm of the dataset. Datasets may have multiple realms, but the main one should be listed. Datasets may contain sampling sites from different realms in the "sites" sheet. IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/
    • medium: the physical medium the microphone is situated in
    • protected_area: Whether the sampling sites were situated in protected areas or not, or only some.
    • GADM0: For datasets on land or in territorial waters, Global Administrative Database level0
      https://gadm.org/
    • GADM1: For datasets on land or in territorial waters, Global Administrative Database level1
      https://gadm.org/
    • GADM2: For datasets on land or in territorial waters, Global Administrative Database level2
      https://gadm.org/
    • IHO: For marine locations, the sea area that encompassess all the sampling locations according to the International Hydrographic Organisation. Map here: https://www.arcgis.com/home/item.html?id=44e04407fbaf4d93afcb63018fbca9e2
    • locality: optional free text about the locality
    • latitude_numeric_region: study region approximate centroid latitude in WGS84 decimal degrees
    • longitude_numeric_region: study region approximate centroid longitude in WGS84 decimal degrees
    • sites_number: number of sites sampled
    • year_start: starting year of the sampling
    • year_end: ending year of the sampling
    • deployment_schedule: description of the sampling schedule, provisional
    • temporal_recording_selection: list environmental exclusion criteria that were used to determine which recording days or times to discard
    • high_pass_filter_Hz: frequency of the high-pass filter of the recorder, in Hz
    • variable_sampling_frequency: Does the sampling frequency vary? If it does, write "NA" in the sampling_frequency_kHz column and indicate it in the sampling_frequency_kHz column inside the deployments sheet
    • sampling_frequency_kHz: frequency the microphone was sampled at (sounds of half that frequency will be recorded)
    • variable_recorder:
    • recorder: recorder model used
    • microphone: microphone used
    • freshwater_recordist_position: position of the recordist relative to the microphone during sampling (only for freshwater)
    • collaborator_comments: free-text field for comments by the collaborators
    • validated: This cell is checked if the contents of all sheets are complete and have been found to be coherent and consistent with our requirements.
    • validator_name: name of person doing the validation
    • validation_comments: validators: please insert the date when someone was contacted
    • cross-check: this cell is checked if the collaborators confirm the spatial and temporal data after checking the corresponding site maps, deployment and operation time graphs found at https://drive.google.com/drive/folders/1qfwXH_7dpFCqyls-c6b8RZ_fbcn9kXbp?usp=share_link

    datasets-sites

    • dataset_ID: primary key of datasets table
    • dataset_name: lookup field
    • site_ID: primary key of sites table
    • site_name: lookup field

    sites

    • site_ID: unique site IDs, larger than 1000 for compatibility with ecoSound-web
    • site_name: name or code of sampling site as used in respective projects
    • latitude_numeric: exact numeric degrees coordinates of latitude
    • longitude_numeric: exact numeric degrees coordinates of longitude
    • topography_m: for sites on land: elevation. For marine sites: depth (negative). in meters
    • freshwater_depth_m
    • realm: Ecosystem type according to IUCN GET https://global-ecosystems.org/
    • biome: Ecosystem type according to IUCN GET https://global-ecosystems.org/
    • functional_group: Ecosystem type according to IUCN GET https://global-ecosystems.org/
    • comments

    deployments

    • dataset_ID: primary key of datasets table
    • dataset_name: lookup field
    • deployment: use identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.
    • start_date_min: earliest date of deployment start, double-click cell to get date-picker
    • start_date_max: latest date of deployment start, if applicable (only used when recorders were deployed over several days), double-click cell to get date-picker
    • start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")
    • permanent: is the deployment permanent (in which case it would be ongoing and the end date or duration would be unknown)?
    • variable_duration_days: is the duration of the deployment variable? in days
    • duration_days: deployment duration per recorder (use the minimum if variable)
    • end_date_min: earliest date of deployment end, only needed if duration is variable, double-click cell to get date-picker
    • end_date_max: latest date of deployment end, only needed if duration is variable, double-click cell to get date-picker
    • end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.
    • recording_time: does the recording last from the deployment start time to the end time (continuous) or at scheduled daily intervals (scheduled)? Note: we consider recordings with duty cycles to be continuous.
    • operation_start_time_mixed: scheduled recording start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")
    • operation_duration_minutes: duration of operation in minutes, if constant
    • operation_end_time_mixed: scheduled recording end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")
    • duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes.
    • sampling_frequency_kHz: only indicate the sampling frequency if it is variable within a particular dataset so that we need to code different frequencies for different deployments
    • recorder
    • subset_sites: If the deployment was not done in all the sites of the

  12. P

    How Do I Contact "Bitdefender Customer Service"? A Simple Guide Dataset

    • paperswithcode.com
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How Do I Contact "Bitdefender Customer Service"? A Simple Guide Dataset [Dataset]. https://paperswithcode.com/dataset/how-do-i-contact-bitdefender-customer-service
    Explore at:
    Dataset updated
    Jun 18, 2025
    Description

    Click Here : Bitdefender Customer Service

    =================================================================================

    In today's fast-paced digital world, one of the most critical things for people and organizations to do is keep their cyber security up to date. Bitdefender is a well-known firm that provides strong antivirus and internet security software. Like with any other service or product, users may have problems or questions about their subscriptions, features, billing, or installation. At this point, Bitdefender's customer service is a very significant aspect of the support system. This complete guide is called "How Do I Get in Touch with 'Bitdefender Customer Service'?" This easy-to-follow article will show you how to get in touch with Bitdefender's support team in a few different ways so you can obtain fast, useful, and skilled help.

    Understanding How Important Bitdefender Customer Service Is When it comes to cybersecurity services, customer service is highly vital for keeping people pleased. Both new and long-time customers may have problems that pop up out of the blue. These can be issues with installation, activation keys, system compatibility, payment, or security. Bitdefender has a number of help options that are tailored to these situations. If you know how to reach them customer care, you may get your problems fixed quickly and with as little hassle as possible.

    Here are some things you should know before you call Bitdefender Support. You may speed up the process by doing a few things before you call Bitdefender's customer service. Be ready with the following information:

    Peacock Tv Login Peacock Tv Sign in Bitdefender Login Account Bitdefender Sign in Account Norton Login Norton Sign in

    The email address for your Bitdefender account

    Your Bitdefender Central login details

    The key or code that allows you utilize your product

    The device or operating system that is having the difficulty

    A full explanation of the problem or error message you are getting

    Being ready implies that the support crew can help you right away without having to call you back several times.

    First, you need go to Bitdefender Central. When you think, "How do I reach 'Bitdefender Customer Service'?" First, you need to go to Bitdefender Central. This online dashboard lets you keep track of your account, installations, devices, and subscriptions. You can also use customer assistance options like live chat, sending tickets, and articles that help you fix difficulties.

    You may get to Bitdefender Central by signing into your account on the Bitdefender website. To get to the "Support" area, which is normally near the bottom of the dashboard, click on it. Here you may discover a number of useful articles, video lectures, and ways to get in touch with us.

    Chat Support: Talk to a Bitdefender employee right away for help One of the fastest and easiest ways to reach Bitdefender customer service is through live chat. You can get this tool from Bitdefender Central and talk to a live person in real time. The chat service is there to assist you fix problems right away, whether they have to do with your account or technology.

    To start a chat session, click the "Contact Support" or "Chat with an Expert" button. Once you get in touch, explain your situation in detail and follow the support person's instructions. This is the simplest way to deal with issues that need to be repaired fast but aren't too hard.

    Email Support: For Help That Is Thorough and Well-Documented Email support is another useful option if you need to send in papers or give detailed explanations. On Bitdefender's Central platform, people can make a support ticket. This choice is appropriate for hard situations like disputed charges, license transfers, or technical problems that keep coming up and need more support.

    To put in a support ticket, go to the Bitdefender Central customer service page, fill out the form, explain your problem, and attach any files that are important. If your problem is simple, a representative will usually come back to you within a few hours to a day.

    Phone Support: Get in touch with a Bitdefender Agent Sometimes, the best and most reassuring thing to do is to call customer service right away. In some places, Bitdefender offers free phone support, which enables users clearly explain their concerns and get speedy solutions.

    You can find the relevant phone number for your country on the Bitdefender Contact page. The wait periods may be greater or shorter depending on how busy it is, but the agents are ready to answer any question, from minor problems to more complicated security issues.

    Websites and forums for the community If you want to fix problems on your own or learn more before talking to a professional, the Bitdefender Community Forum is a fantastic place to go. This platform lets users and official moderators speak about items and give advice, fixes, and information on software.

    The Knowledge Base section is another wonderful way to get in-depth information, answers to common questions, and step-by-step guides. A lot of people get answers here without having to call customer service.

    Help with Bitdefender for Business Users You might need more specific advice if your firm uses Bitdefender GravityZone or other corporate solutions. Business users can access dedicated enterprise help through the GravityZone portal. Enterprise users can report issues, start conversations, and seek for more help that is tailored to their security and infrastructure needs.

    Most business accounts come with account managers or technical support teams who can aid with deployment, integration, and ways to deal with threats in real time.

    How to Fix Common Problems Before Calling Support How to contact "Bitdefender Customer Service" "A Simple Guide" also tells you when you might not need to get in touch with them at all. You can fix a number of common problems on your own with Bitdefender. For example:

    Installation problems: Downloading the full offline installer generally cures the problem.

    Activation errors happen when the license numbers are inaccurate or the subscription has run out.

    Problems with performance can usually be fixed by changing the scan schedule or updating the program.

    The "My Subscriptions" option in Bitdefender Central makes it easy to deal with billing problems.

    Using these tools can save you time and cut down on the number of times you have to call customer service.

    What Remote Help Does for Tech Issues Bitdefender can also aid you with problems that are tougher to fix from a distance. You will need to install a remote access tool so that the technician can take control of your system and fix the problem themselves after you set up a time to chat to a support agent. This is especially useful for those who aren't very good with technology or for firms that have multiple levels of protection.

    Remote help makes sure that problems are handled in a competent way and gives you peace of mind that your digital security is still safe.

    How to Keep Bitdefender Safe and Up to Date Doing regular maintenance is one of the easiest ways to cut down on the need for customer service. You need to update your Bitdefender program on a regular basis to acquire the latest security updates, malware definitions, and functionality upgrades. To avoid compatibility issues, make sure that your operating system and any third-party software you use are also up to date.

    Regular scans, avoiding suspicious websites, and checking the Bitdefender dashboard for alerts will help keep your system safe and minimize the chances that you'll require support right away.

    What Bitdefender Mobile App Support can do You can also get support from the Bitdefender app on your Android or iOS device. The mobile interface lets you manage your devices, renew your membership, and even talk to customer care directly from your phone. This can be quite helpful for folks who need support while they're on the go or who are experiencing trouble with their phone, such setting up a VPN or parental controls.

    Keeping consumer data and conversation private Bitdefender keeps its clients' privacy very high when they talk to them. There are strict laws about privacy and data protection for all kinds of contact, such as phone calls, emails, chats, and remote help. When you need to get in touch with customer service, always utilize real means. Don't give out personal information unless the help process requires you to.

    Final Thoughts on How to Contact Bitdefender Customer Service Bitdefender's customer service is designed to help you with any issue, whether it's a technical problem, a query about a payment, or just a desire for guidance, swiftly, clearly, and professionally. Being able to contact someone, have the proper information ready, and choosing the best route to obtain help can make a great difference in how you feel about the whole thing.

  13. h

    corpus-dataset-normalized-for-persian-and-english

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Bahadorani, corpus-dataset-normalized-for-persian-and-english [Dataset]. https://huggingface.co/datasets/ali619/corpus-dataset-normalized-for-persian-and-english
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Ali Bahadorani
    Description

    Dataset Summary

    Persian data of this dataset is a collection of 400k blog posts (RohanAiLab/persian_blog). these posts have been gathered from more than 10 websites. This dataset can be used in different NLP tasks like language modeling, creating tokenizer and text generation tasks.

    To see Persian data in Viewer tab click here

    English data of this dataset is merged from english-wiki-corpus dataset. Note: If you need only Persian corpus click here Note: The data for both Persian… See the full description on the dataset page: https://huggingface.co/datasets/ali619/corpus-dataset-normalized-for-persian-and-english.

  14. Watershed Boundary Dataset HUC 10s

    • hub.arcgis.com
    • resilience.climate.gov
    • +3more
    Updated Sep 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2023). Watershed Boundary Dataset HUC 10s [Dataset]. https://hub.arcgis.com/datasets/42f868d269784624b0a2df9757c34abb
    Explore at:
    Dataset updated
    Sep 5, 2023
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    Each drainage area is considered a Hydrologic Unit (HU) and is given a Hydrologic Unit Code (HUC) which serves as the unique identifier for the area. HUC 2s, 6s, 8s, 10s, & 12s, define the drainage Regions, Subregions, Basins, Subbasins, Watersheds and Subwatersheds, respectively, across the United States. Their boundaries are defined by hydrologic and topographic criteria that delineate an area of land upstream from a specific point on a river and are determined solely upon science based hydrologic principles, not favoring any administrative boundaries, special projects, or a particular program or agency. The Watershed Boundary Dataset is delineated and georeferenced to the USGS 1:24,000 scale topographic basemap.Hydrologic Units are delineated to nest in a multi-level, hierarchical drainage system with corresponding HUCs, so that as you move from small scale to large scale the HUC digits increase in increments of two. For example, the very largest HUCs have 2 digits, and thus are referred to as HUC 2s, and the very smallest HUCs have 12 digits, and thus are referred to as HUC 12s.Dataset SummaryPhenomenon Mapped: Watersheds in the United States, as delineated by the Watershed Boundary Dataset (WBD)Geographic Extent: Contiguous United States, Alaska, Hawaii, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands and American SamoaProjection: Web MercatorUpdate Frequency: AnnualVisible Scale: Visible at all scales, however USGS recommends this dataset should not be used for scales of 1:24,000 or larger.Source: United States Geological Survey (WBD)Data Vintage: January 7, 2025What can you do with this layer?This layer is suitable for both visualization and analysis acrossthe ArcGIS system. This layer can be combined with your data and other layers from the ArcGIS Living Atlas of the World in ArcGIS Online and ArcGIS Pro to create powerful web maps that can be used alone or in a story map or other application. Because this layer is part of the ArcGIS Living Atlas of the World it is easy to add to your map:In ArcGIS Online, you can add this layer to a map by selecting Add then Browse Living Atlas Layers. A window will open. Type "Watershed Boundary Dataset" in the search box and browse to the layer. Select the layer then click Add to Map. In ArcGIS Pro, open a map and select Add Data from the Map Tab. Select Data at the top of the drop down menu. The Add Data dialog box will open on the left side of the box, expand Portal if necessary, then select Living Atlas. Type "Watershed Boundary Dataset" in the search box, browse to the layer then click OK.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.

  15. Esports Performance Rankings and Results

    • kaggle.com
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Esports Performance Rankings and Results [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-collegiate-esports-performance-with-bu/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 12, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Esports Performance Rankings and Results

    Performance Rankings and Results from Multiple Esports Platforms

    By [source]

    About this dataset

    This dataset provides a detailed look into the world of competitive video gaming in universities. It covers a wide range of topics, from performance rankings and results across multiple esports platforms to the individual team and university rankings within each tournament. With an incredible wealth of data, fans can discover statistics on their favorite teams or explore the challenges placed upon university gamers as they battle it out to be the best. Dive into the information provided and get an inside view into the world of collegiate esports tournaments as you assess all things from Match ID, Team 1, University affiliations, Points earned or lost in each match and special Seeds or UniSeeds for exceptional teams. Of course don't forget about exploring all the great Team Names along with their corresponding websites for further details on stats across tournaments!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Download Files First, make sure you have downloaded the CS_week1, CS_week2, CS_week3 and seeds datasets on Kaggle. You will also need to download the currentRankings file for each week of competition. All files should be saved using their originally assigned name in order for your analysis tools to read them properly (ie: CS_week1.csv).

    Understand File Structure Once all data has been collected and organized into separate files on your desktop/laptop computer/mobile device/etc., it's time to become familiar with what type of information is included in each file. The main folder contains three main data files: week1-3 and seedings. The week1-3 contain teams matched against one another according to university, point score from match results as well as team name and website URL associated with university entry; whereas the seedings include a ranking system amongst university entries which are accompanied by information regarding team names, website URLs etc.. Furthermore, there is additional file featured which contains currentRankings scores for each individual player/teams for an first given period of competition (ie: first week).

    Analyzing Data Now that everything is set up on your end it’s time explore! You can dive deep into trends amongst universities or individual players in regards to specific match performances or standings overall throughout weeks of competition etc… Furthermore you may also jumpstart insights via further creation of graphs based off compiled date from sources taken from BUECTracker dataset! For example let us say we wanted compare two universities- let's say Harvard University v Cornell University - against one another since beginning of event i we shall extract respective points(column),dates(column)(found under result tab) ,regions(csilluminating North America vs Europe etc)general stats such as maps played etc.. As well any other custom ideas which would come along in regards when dealing with similar datasets!

    Research Ideas

    • Analyze the performance of teams and identify areas for improvement for better performance in future competitions.
    • Assess which esports platforms are the most popular among gamers.
    • Gain a better understanding of player rankings across different regions, based on rankings system, to create targeted strategies that could boost individual players' scoring potential or team overall success in competitive gaming events

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: CS_week1.csv | Column name | Description | |:---------------|:----------------------------------------------| | Match ID | Unique identifier for each match. (Integer) | | Team 1 | Name of the first team in the match. (String) | | University | University associated with the team. (String) |

    File: CS_week1_currentRankings.csv | Column name | Description | |:--------------|:-----------------------------------------------------------|...

  16. c

    Watershed Boundary Dataset HUC 6s

    • resilience.climate.gov
    • resilience-fema.hub.arcgis.com
    • +3more
    Updated Sep 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2023). Watershed Boundary Dataset HUC 6s [Dataset]. https://resilience.climate.gov/datasets/esri::watershed-boundary-dataset-huc-6s
    Explore at:
    Dataset updated
    Sep 5, 2023
    Dataset authored and provided by
    Esri
    Area covered
    Description

    Each drainage area is considered a Hydrologic Unit (HU) and is given a Hydrologic Unit Code (HUC) which serves as the unique identifier for the area. HUC 2s, 6s, 8s, 10s, & 12s, define the drainage Regions, Subregions, Basins, Subbasins, Watersheds and Subwatersheds, respectively, across the United States. Their boundaries are defined by hydrologic and topographic criteria that delineate an area of land upstream from a specific point on a river and are determined solely upon science based hydrologic principles, not favoring any administrative boundaries, special projects, or a particular program or agency. The Watershed Boundary Dataset is delineated and georeferenced to the USGS 1:24,000 scale topographic basemap.Hydrologic Units are delineated to nest in a multi-level, hierarchical drainage system with corresponding HUCs, so that as you move from small scale to large scale the HUC digits increase in increments of two. For example, the very largest HUCs have 2 digits, and thus are referred to as HUC 2s, and the very smallest HUCs have 12 digits, and thus are referred to as HUC 12s.Dataset SummaryPhenomenon Mapped: Watersheds in the United States, as delineated by the Watershed Boundary Dataset (WBD)Geographic Extent: Contiguous United States, Alaska, Hawaii, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands and American SamoaProjection: Web MercatorUpdate Frequency: AnnualVisible Scale: Visible at all scales, however USGS recommends this dataset should not be used for scales of 1:24,000 or larger.Source: United States Geological Survey (WBD)Data Vintage: January 7, 2025What can you do with this layer?This layer is suitable for both visualization and analysis acrossthe ArcGIS system. This layer can be combined with your data and other layers from the ArcGIS Living Atlas of the World in ArcGIS Online and ArcGIS Pro to create powerful web maps that can be used alone or in a story map or other application. Because this layer is part of the ArcGIS Living Atlas of the World it is easy to add to your map:In ArcGIS Online, you can add this layer to a map by selecting Add then Browse Living Atlas Layers. A window will open. Type "Watershed Boundary Dataset" in the search box and browse to the layer. Select the layer then click Add to Map. In ArcGIS Pro, open a map and select Add Data from the Map Tab. Select Data at the top of the drop down menu. The Add Data dialog box will open on the left side of the box, expand Portal if necessary, then select Living Atlas. Type "Watershed Boundary Dataset" in the search box, browse to the layer then click OK.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.

  17. NSF Public Access Repository

    • catalog.data.gov
    Updated Sep 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Science Foundation (2021). NSF Public Access Repository [Dataset]. https://catalog.data.gov/dataset/nsf-public-access-repository
    Explore at:
    Dataset updated
    Sep 19, 2021
    Dataset provided by
    National Science Foundationhttp://www.nsf.gov/
    Description

    The NSF Public Access Repository contains an initial collection of journal publications and the final accepted version of the peer-reviewed manuscript or the version of record. To do this, NSF draws upon services provided by the publisher community including the Clearinghouse of Open Research for the United States, CrossRef, and International Standard Serial Number. When clicking on a Digital Object Identifier number, you will be taken to an external site maintained by the publisher. Some full text articles may not be available without a charge during the embargo, or administrative interval. Some links on this page may take you to non-federal websites. Their policies may differ from this website.

  18. FOI-02670 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nhsbsa.net (2025). FOI-02670 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-02670
    Explore at:
    Dataset updated
    Apr 13, 2025
    Dataset provided by
    NHS Business Services Authority
    Description

    1) How many requests for a mandatory reversal have been received following an unsuccessful claim to the Vaccine Damage Payment Scheme? 2) How many of the claimants have made requests for a mandatory reversal more than once? 3) For those claimants who have made multiple requests for a mandatory reversal, have any been successful? If the figure is over 5, please state)’ The NHS Business Services Authority (NHSBSA) received your request on 17 March 2025. We have handled your request under the Freedom of Information Act 2000 (FOIA). Our response I can confirm that the NHSBSA holds the information you have requested. All data as of 28 February 2025. All data relates to claims received by the NHSBSA and those transferred from the previous administrator, the Department for Work and Pensions (DWP). All data provided relates to COVID-19 vaccines only. Fewer than five Please be aware that I have decided not to release the full details where the total number of individuals falls below five. This is because the individuals could be identified, when combined with other information that may be in the public domain or reasonably available. This information falls under the exemption in section 40 subsections 2 and 3 (a) of the Freedom of Information Act (FOIA). This is because it would breach the first data protection principle as: a. it is not fair to disclose individual’s personal details to the world and is likely to cause damage or distress. b. these details are not of sufficient interest to the public to warrant an intrusion into the privacy of the individual. Please click the below web link to see the exemption in full: https://www.legislation.gov.uk/ukpga/2000/36/section/40

  19. d

    SPM LTDS Appendix 6 Connection Activity (Table 6) 132kV - Mid-Year Update -...

    • demo.dev.datopian.com
    Updated May 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). SPM LTDS Appendix 6 Connection Activity (Table 6) 132kV - Mid-Year Update - Dataset - Datopian CKAN instance [Dataset]. https://demo.dev.datopian.com/dataset/sp-energy-networks--spm-ltds-appendix-6-connection-activity-table-6-132kv-6-montly-update
    Explore at:
    Dataset updated
    May 27, 2025
    Description

    The "SPM LTDS Appendix 6 Connection Activity (Table 6) 132kV) - Mid-Year Update" dataset provides details of connection applications and budget estimate offers made are provided, together with an indication of the source Grid Supply Point (GSP) substation.Click here to access our full Long Term Development Statements for both SPD & SPM. For additional information on column definitions, please click on the Dataset schema link below.Note: A fully formatted copy of this appendix can be downloaded from the Export tab under Alternative exports.Download dataset metadata (JSON)If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above. Data Triage : SPEN Data Triage Risk Assessments provide information and a detailed overview of how we approach the Data Triage process. The risk assessment will determine the dataset classification and whether it can be made available, and under which licence. Click below to view the Data Triage document for this dataset. These are hosted on our SP Energy Networks website and can be viewed by clicking here Download File

  20. FOI 30960 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Jan 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nhsbsa.net (2023). FOI 30960 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-30960
    Explore at:
    Dataset updated
    Jan 25, 2023
    Dataset provided by
    NHS Business Services Authority
    Description

    This is because it would breach the first data protection principle as: a) it is not fair to disclose claimant personal details to the world and is likely to cause damage or distress. b) these details are not of sufficient interest to the public to warrant an intrusion into the privacy of the claimant. Please click the below web link to see the exemption in full. https://www.legislation.gov.uk/ukpga/2000/36/section/40 Breach of Patient confidentiality Please note that the identification of claimants is also a breach of the common law duty of confidence. A claimant who has been identified could make a claim against the NHSBSA or yourself for the disclosure of the confidential information. The information requested is therefore being withheld as it falls under the exemption in section 41(1) ‘Information provided in confidence’ of the Freedom of Information Act. Please click the below web link to see the exemption in full. https://www.legislation.gov.uk/ukpga/2000/36/section/41

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ciobanu Marius (2024). 📣 Ad Click Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/marius2303/ad-click-prediction-dataset
Organization logo

📣 Ad Click Prediction Dataset

Predict whether a user will click on an ad, challenging clean data request

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ciobanu Marius
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

About

This dataset provides insights into user behavior and online advertising, specifically focusing on predicting whether a user will click on an online advertisement. It contains user demographic information, browsing habits, and details related to the display of the advertisement. This dataset is ideal for building binary classification models to predict user interactions with online ads.

Features

  • id: Unique identifier for each user.
  • full_name: User's name formatted as "UserX" for anonymity.
  • age: Age of the user (ranging from 18 to 64 years).
  • gender: The gender of the user (categorized as Male, Female, or Non-Binary).
  • device_type: The type of device used by the user when viewing the ad (Mobile, Desktop, Tablet).
  • ad_position: The position of the ad on the webpage (Top, Side, Bottom).
  • browsing_history: The user's browsing activity prior to seeing the ad (Shopping, News, Entertainment, Education, Social Media).
  • time_of_day: The time when the user viewed the ad (Morning, Afternoon, Evening, Night).
  • click: The target label indicating whether the user clicked on the ad (1 for a click, 0 for no click).

Goal

The objective of this dataset is to predict whether a user will click on an online ad based on their demographics, browsing behavior, the context of the ad's display, and the time of day. You will need to clean the data, understand it and then apply machine learning models to predict and evaluate data. It is a really challenging request for this kind of data. This data can be used to improve ad targeting strategies, optimize ad placement, and better understand user interaction with online advertisements.

Search
Clear search
Close search
Google apps
Main menu