25 datasets found
  1. BBC NEWS SUMMARY(CSV FORMAT)

    • kaggle.com
    zip
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhiraj (2024). BBC NEWS SUMMARY(CSV FORMAT) [Dataset]. https://www.kaggle.com/datasets/dignity45/bbc-news-summarycsv-format
    Explore at:
    zip(2097600 bytes)Available download formats
    Dataset updated
    Sep 9, 2024
    Authors
    Dhiraj
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description: Text Summarization Dataset

    This dataset is designed for users aiming to train models for text summarization. It contains 2,225 rows of data with two columns: "Text" and "Summary". Each row features a detailed news article or piece of text paired with its corresponding summary, providing a rich resource for developing and fine-tuning summarization algorithms.

    Key Features:

    • Text: Full-length articles or passages that serve as the input for summarization.
    • Summary: Concise summaries of the articles, which are ideal for training models to generate brief, coherent summaries from longer texts.

    Future Enhancements:

    This evolving dataset is planned to include additional features, such as text class labels, in future updates. These enhancements will provide more context and facilitate the development of models that can perform summarization across different categories of news content.

    Usage:

    Ideal for researchers and developers focused on text summarization tasks, this dataset enables the training of models to effectively compress information while retaining the essence of the original content.

    Acknowledgment

    We would like to extend our sincere gratitude to the dataset creator for their contribution to this valuable resource. This dataset, sourced from the BBC News Summary dataset on Kaggle, was created by Pariza. Their work has provided an invaluable asset for those working on text summarization tasks, and we appreciate their efforts in curating and sharing this data with the community.

    Thank you for supporting research and development in the field of natural language processing!

    File Description

    This script processes and consolidates text data from various directories containing news articles and their corresponding summaries. It reads the files from specified folders, handles encoding issues, and then creates a DataFrame that is saved as a CSV file for further analysis.

    Key Components:

    1. Imports:

      • numpy (np): Numerical operations library, though it's not used in this script.
      • pandas (pd): Data manipulation and analysis library.
      • os: For interacting with the operating system, e.g., building file paths.
      • glob: For file pattern matching and retrieving file paths.
    2. Function: get_texts

      • Parameters:
        • text_folders: List of folders containing news article text files.
        • text_list: List to store the content of text files.
        • summ_folder: List of folders containing summary text files.
        • sum_list: List to store the content of summary files.
        • encodings: List of encodings to try for reading files.
      • Purpose:
        • Reads text files from specified folders, handles different encodings, and appends the content to text_list and sum_list.
        • Returns the updated lists of texts and summaries.
    3. Data Preparation:

      • text_folder: List of directories for news articles.
      • summ_folder: List of directories for summaries.
      • text_list and summ_list: Initialize empty lists to store the contents.
      • data_df: Empty DataFrame to store the final data.
    4. Execution:

      • Calls get_texts function to populate text_list and summ_list.
      • Creates a DataFrame data_df with columns 'Text' and 'Summary'.
      • Saves data_df to a CSV file at /kaggle/working/bbc_news_data.csv.
    5. Output:

      • Prints the first few entries of the DataFrame to verify the content.

    Column Descriptions:

    • Text: Contains the full-length articles or passages of news content. This column is used as the input for summarization models.
    • Summary: Contains concise summaries of the corresponding articles in the "Text" column. This column is used as the target output for summarization models.

    Usage:

    • This script is designed to be run in a Kaggle environment where paths to text data are predefined.
    • It is intended for preprocessing and saving text data from news articles and summaries for subsequent analysis or model training.
  2. Data from: SEC Filings

    • kaggle.com
    zip
    Updated Jun 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). SEC Filings [Dataset]. https://www.kaggle.com/datasets/bigquery/sec-filings
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Jun 5, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. For more information please see this site.

    To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience.

    DISCLAIMER: The Financial Statement and Notes Data Sets contain information derived from structured data filed with the Commission by individual registrants as well as Commission-generated filing identifiers. Because the data sets are derived from information provided by individual registrants, we cannot guarantee the accuracy of the data sets. In addition, it is possible inaccuracies or other errors were introduced into the data sets during the process of extracting the data and compiling the data sets. Finally, the data sets do not reflect all available information, including certain metadata associated with Commission filings. The data sets are intended to assist the public in analyzing data contained in Commission filings; however, they are not a substitute for such filings. Investors should review the full Commission filings before making any investment decision.

  3. Ken Jee YouTube Data

    • kaggle.com
    zip
    Updated Jan 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ken Jee (2022). Ken Jee YouTube Data [Dataset]. https://www.kaggle.com/datasets/kenjee/ken-jee-youtube-data
    Explore at:
    zip(6556461 bytes)Available download formats
    Dataset updated
    Jan 22, 2022
    Authors
    Ken Jee
    Area covered
    YouTube
    Description

    Context

    I've been creating videos on YouTube since November of 2017 (https://www.youtube.com/c/KenJee1) with the mission of making data science accessible to more people. One of the best ways to do this is to tell stories and working on projects. This is my attempt at my first community project. I am making my YouTube data available for everyone to help better understand the growth of my YouTube community and think about ways that it could be improved! I would love for everyone in the community feel like they had some hand in contributing to the channel.

    Announcement Video: https://youtu.be/YPph59-rTxA

    I will be sharing my favorite projects in a few of my videos (with permission of course), and would also like to give away a few small prizes to the top featured notebooks. I hope you have fun with the analysis, I'm interested in seeing what you find in the data!

    For those looking for a place to start, some things I'm thinking about are: - What are the themes of the comment data? - What types of video titles and thumbnails drive the most traffic? - Who is my core audience and what are they interested in? - What types of videos have lead to the most growth? - What type of content are people engaging with the most or watching the longest?

    Some advanced projects could be: - Creating a chat bot to respond to common comments with videos where I have addressed a topic - Pulling sentiment from thumbnails and titles and comparing that with performance

    Data I would like to add over time - Video descriptions - Video subtitles - Actual video data

    Content

    There are four files in this repo. The relevant data included in most of them is from Nov 2017 - Jan 2022. I gathered some of this data via the YouTube API and the rest from my specific analytics.

    1) Aggregated Metrics By Video - This has all the topline metrics from my channel from its start (around 2015 to Jan 22 2022). I didn't post my first video until around 2) Aggregated Metrics By Video with Country and Subscriber Status - This has the same data as aggregated metrics by video, but it includes dimensions for which country people are viewing from and if the viewers are subscribed to the channel or not. 3) Video Performance Over Time - This has the daily data from each of my videos. 4) All Comments - This is all of my comment data gathered from the YouTube API. I have anonymized the users so don't worry about your name showing up!

    Acknowledgements

    This obviously wouldn't be possible without all of the wonderful people who watch and interact with my videos! I'm incredibly grateful for you all and I'm so happy I can share this project with you!

    License

    I collected this data from the YouTube API and through my own google analytics. Thus use of it must uphold the YouTube API's terms of service: https://developers.google.com/youtube/terms/api-services-terms-of-service

  4. d

    2.03 311 First-Call Resolution (summary)

    • catalog.data.gov
    • performance.tempe.gov
    • +7more
    Updated Nov 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). 2.03 311 First-Call Resolution (summary) [Dataset]. https://catalog.data.gov/dataset/2-03-311-first-call-resolution-summary-5048d
    Explore at:
    Dataset updated
    Nov 1, 2025
    Dataset provided by
    City of Tempe
    Description

    The Customer Relations Center (CRC) or Tempe 311 is often the first and possibly only contact a resident has with the City. Our goal is to make each interaction as smooth and efficient as possible. To efficiently provide our residents an improved level of customer service, Tempe 311 strives to serve our residents by acting as the central connection to accessible information and government services. Our purpose is realized through our ability to resolve calls with a single point of contact. When we do this, we have met 311’s mission and provided effective customer service. Tempe 311 CRC strives to achieve Single Point of Contact (SPOC) resolution rate greater than or equal to 75% of incoming calls.This page provides data for the 311 First-Call Resolution Rate performance measure.The performance measure dashboard is available at 2.03 311 First-Call Resolution Rate.Data DictionaryAdditional InformationSource:Contact: Moncayo, KimContact E-Mail: Kim_Moncayo@tempe.govData Source Type: Accela CRM, Excel, Cisco Unified IntelligencePreparation Method: The data from every 311 call is entered into the city's Accela CRM database system. We use that information in conjunction with Cisco Unified Intelligence Center, a separate report is generated to pull out transferred and non 311 callsPublish Frequency: QuarterlyPublish Method: Manual

  5. Cristiano Ronaldo YouTube Stats Data (Daily Pull)

    • kaggle.com
    zip
    Updated Nov 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Ashraf (2025). Cristiano Ronaldo YouTube Stats Data (Daily Pull) [Dataset]. https://www.kaggle.com/datasets/ahmad03038/cristiano-ronaldo-youtube-stats-data-daily-pull
    Explore at:
    zip(13888 bytes)Available download formats
    Dataset updated
    Nov 9, 2025
    Authors
    Ahmed Ashraf
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    YouTube
    Description

    This dataset collects and updates daily statistics for **Cristiano Ronaldo’s **YouTube channel, tracking key metrics such as view count, like count, and comment count for each video.

    The dataset is updated automatically every day, with the pull_date column indicating when the data was fetched. This allows you to analyze video performance over time and observe trends in engagement.

    This dataset is useful for content analysis, trend tracking, and time-series analysis of Cristiano Ronaldo’s YouTube presence.

  6. Google Data Analytics Capstone Project: Netflix

    • kaggle.com
    zip
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doga Celik (2024). Google Data Analytics Capstone Project: Netflix [Dataset]. https://www.kaggle.com/datasets/dogacelik/google-data-analytics-capstone-project-netflix
    Explore at:
    zip(59851 bytes)Available download formats
    Dataset updated
    Jan 25, 2024
    Authors
    Doga Celik
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction:

    In this case study the skills that I acquired from Google Data Analytics Professional Certificate Course is demonstrated. These skills will be used to complete the imagined task which was given by Netflix. The analysis process of this task will be consisted of following steps. Ask, Prepare, Process, Analyze, Share and Act.

    Scenario:

    The Netflix Chief Content Officer, Bela Bajaria, believes that companies success depends on to provide the customers what they want. Bajaria stated that the goal of this task is to find most wanted contents of the movies which will be added to the portfolio. Most of the movie contracts are signed before they come to the theaters, and it is hard to know if the customers really want to watch that movie and if the movie will be successful. There for my team wants to understand what type of content a movies success depends on. From these insights my team will design an investment strategy to choose the most popular movies that are expected to be in theaters in the near future. But first, Netflix executives must approve our recommendations. To be able to do that we must provide satisfying data insights along with professional data visualizations.

    About the Company:

    At Netflix, we want to entertain the world. Whatever your taste, and no matter where you live, we give you access to best-in-class TV series, documentaries, feature films and games. Our members control what they want to watch, when they want it, in one simple subscription. We’re streaming in more than 30 languages and 190 countries, because great stories can come from anywhere and be loved everywhere. We are the world’s biggest fans of entertainment, and we’re always looking to help you find your next favorite story.

    As a company Netflix knows that it is important to acquire or produce movies that people want to watch.

    There for Bajaria has set a clear goal: Define an investment strategy that will allow Netflix to provide customers the movies what they want to watch which will maximize the Sales.

    Ask:

    Business Task: To find out what kind of movie customers wants to watch and if the content type really has a correlation with the movie success. Stakeholders:

    Bela Bajaria: She joined Netflix in 2016 to oversee unscripted and scripted series. Bajaria also responsible from the content selection and strategy for different regions.

    Netflix content analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Netflix content strategy.

    Netflix executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended content program.

    Prepare:

    I start my preparation procedure by downloading every piece of data I'll need for the study. Top 1000 Highest-Grossing Movies of All Time.csv will be used. Additionally, 15 Lowest-Grossing Movies of All Time.csv was found during the data research and this dataset will be analyst as well. The data has been made available by IMDB and shared this two following URL addresses: https://www.imdb.com/list/ls098063263/ and https://www.imdb.com/list/ls069238222/ .

    Process:

    Data Cleaning:

    SQL: To begin the data cleaning process, I opened both csv file in SQL and conducted following operations:

    • Checked for and removed any duplicates. • Checked if there any null values. • Removed the columns that are not necessary. • Trim the Description column to have only gross profit in it. (This cleaning procedure only used for 1000 Highest-Grossing Movies of All Time.csv dataset.)

    • Renamed the Description column as Gross_Profit. (This cleaning procedure only used for 1000 Highest-Grossing Movies of All Time.csv dataset.)

    Follwing SQL codes were used during the data cleaning:

    SQL CODE used for Highest Grossing Movies DATASET

    SELECT Position, SUBSTR(Description,34,12) as Gross_Profit, Title, IMDb_Rating, Runtime_mins_, Year, Genres, Num_Votes, Release_Date FROM even-electron-400301.Highest_Gross_Movies.1

    SQL CODE used for Lowest Grossing Movies DATASET

    SELECT Position, Title, IMDb_Rating, Runtime_mins_, Year, Genres, Num_Votes, Release_Date FROM even-electron-400301.Lowest_Grossing_Movies.2 Order By Position

    Analyze:

    As a starter, I want to reemphasize the business task once again. Is content has a big impact on a movie’s success?

    To answer this question, there were a few information that I projected that I could pull of and use it during my analysis.

    • Average gross profit • Number of Genres • Total Gross Profit of the most popular genres • The distribution of the Gross income on Genres

    I used Microsoft Excel for the bullet points above. The operations to achieve the values above are as follows:

    • Average function for Average Gross profit in 1000 Highest-Grossing Movies of All Time. • Created a pivot table to work on Genres and Gross_Pr...

  7. R scripts

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    txt
    Updated May 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xueying Han (2018). R scripts [Dataset]. http://doi.org/10.6084/m9.figshare.5513170.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 10, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Xueying Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R scripts in this fileset are those used in the PLOS ONE publication "A snapshot of translational research funded by the National Institutes of Health (NIH): A case study using behavioral and social science research awards and Clinical and Translational Science Awards funded publications." The article can be accessed here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196545This consists of all R scripts used for data cleaning, data manipulation, and statistical analysis used in the publication.There are eleven files in total:1. "Step1a.bBSSR.format.grants.and.publications.data.R" combines all bBSSR 2008-2014 grant award data and associated publications downloaded from NIH Reporter. 2. "Step1b.BSSR.format.grants.and.publications.data.R" combines all BSSR-only 2008-2014 grant award data and associated publications downloaded from NIH Reporter. 3. "Step2a.bBSSR.get.pubdates.transl.and.all.grants.R" queries PubMed and downloads associated bBSSR publication data.4. "Step2b.BSSR.get.pubdates.transl.and.all.grants.R" queries PubMed and downloads associated BSSR-only publication data.5. "Step3.summary.stats.R" performs summary statistics6. "Step4.time.to.first.publication.R" performs time to first publication analysis.7. "Step5.time.to.citation.analysis.R" performs time to first citation and time to overall citation analyses.8. "Step6.combine.NIH.iCite.data.R" combines NIH iCite citation data.9. "Step7.iCite.data.analysis.R" performs citation analysis on combined iCite data.10. "Step8.MeSH.descriptors.R" queries PubMed and pulls down all MeSH descriptors for all publications11. "Step9.CTSA.publications.R" compares the percent of translational publications among bBSSR, BSSR-only, and CTSA publications.

  8. NVIDIA Stock Data Daily Updated

    • kaggle.com
    zip
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Hidden Layer (2025). NVIDIA Stock Data Daily Updated [Dataset]. https://www.kaggle.com/datasets/isaaclopgu/nvidia-stock-data-daily-updated
    Explore at:
    zip(299223 bytes)Available download formats
    Dataset updated
    Nov 14, 2025
    Authors
    The Hidden Layer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About this Dataset

    This dataset provides a comprehensive, up-to-date collection of daily historical stock data for NVIDIA Corporation (NVDA). It captures key trading data, including opening price, closing price, trading volume, and more, making it suitable for time series analysis, financial forecasting, and algorithmic trading simulations.

    About the Company

    NVIDIA is a leading technology company renowned for its innovations in graphics processing units (GPUs), artificial intelligence (AI), and computer hardware. The company was founded in January 1993 and went public on the NASDAQ in January 1999 under the ticker symbol NVDA. NVIDIA's GPUs and the CUDA platform have become industry standards, serving as the backbone for AI research and development. The company has experienced significant growth fueled by the gaming industry and breakthroughs in AI and deep learning. In recent years, NVIDIA has expanded its reach into data centers, autonomous vehicles, and other high-growth markets.

    Data Dictionary

    Date: The specific calendar date for the trading session, formatted as YYYY-MM-DD.

    Open: The price at which the stock opened at the start of the trading day.

    High: The highest price reached by the stock during the trading day.

    Low: The lowest price recorded for the stock during the trading day.

    Close: The price at which the stock closed at the end of the trading day.

    Volume: The total number of shares traded on that particular day.

    Data Collection

    The data for this dataset is collected using the yfinance Python library, which pulls information directly from the Yahoo Finance API. The dataset covers daily stock prices for NVIDIA Corporation (NVDA), with each entry representing a single trading day.

    Potential Use Cases

    Financial Analysis: Analyze trends and volatility in NVIDIA's stock price over time.

    Machine Learning: Develop and test models for stock price prediction using time-series algorithms like LSTM and ARIMA.

    Backtesting: Use the historical data to backtest and optimize trading strategies.

    Technical Analysis: Calculate and visualize technical indicators such as Moving Averages, Bollinger Bands, and MACD.

  9. f

    A Cross-Sectional Analysis of Oil Pulling on YouTube Shorts data

    • figshare.com
    xlsx
    Updated Jul 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Yaung; Sun Ha Park; Shahed Al-Khalifah (2025). A Cross-Sectional Analysis of Oil Pulling on YouTube Shorts data [Dataset]. http://doi.org/10.6084/m9.figshare.29605031.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 20, 2025
    Dataset provided by
    figshare
    Authors
    Jun Yaung; Sun Ha Park; Shahed Al-Khalifah
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    This dataset accompanies a cross-sectional content analysis examining the portrayal of oil pulling in short-form video content on YouTube Shorts. The study systematically analyzed 47 publicly accessible videos retrieved using the search term “oil pulling.” Each video was coded for speaker type, stance toward oil pulling, cited benefits and risks, reference to scientific evidence, engagement metrics (likes, views, duration), and content characteristics (e.g., recommendations, disclaimers, tone). The analysis is grounded in Social Cognitive Theory (SCT), with a focus on how modeled behaviors and perceived credibility of speakers influence viewers’ oral hygiene attitudes and practices. The dataset includes a structured spreadsheet listing video URLs, speaker classifications, coded variables, and summary statistics. This resource supports transparency, reproducibility, and further research on digital oral health communication, social media influence, and misinformation.

  10. March Madness Historical DataSet (2002 to 2025)

    • kaggle.com
    zip
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis/code
    Explore at:
    zip(7042864 bytes)Available download formats
    Dataset updated
    Oct 2, 2025
    Authors
    Jonathan Pilafas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

    This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

    Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

    These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

    This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.

  11. SEC Public Dataset

    • console.cloud.google.com
    Updated Apr 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=ES (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=ES
    Explore at:
    Dataset updated
    Apr 22, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Más información

  12. D

    Planning Department Project Application Review metrics

    • data.sfgov.org
    • catalog.data.gov
    csv, xlsx, xml
    Updated Nov 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Planning Department Project Application Review metrics [Dataset]. https://data.sfgov.org/City-Infrastructure/Planning-Department-Project-Application-Review-met/d4jk-jw33
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Nov 18, 2025
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    A. SUMMARY This dataset provides review time metrics for the San Francisco Planning Department’s application review process. The following metrics are provided: total days to Planning approval, days to finish completeness review, days to first check plan letter, and days to complete resubmission review. Targets for each metric and outcomes relative to these targets are also included. These metrics allow for ongoing tracking for individual planning projects and for the calculation of summary statistics for Planning review timelines. There are both Project level metrics and project event level metrics in this table.

    You can see a dashboard which shows the City's current permit processing performance on sf.gov.

    B. HOW THE DATASET IS CREATED Planning application review is tracked within Planning’s Project and Permit Tracking System (PPTS). Planners enter review period start and end dates in PPTS when review milestones are reached. Review timeline data is extracted from PPTS and review timelines and outcomes are calculated and consolidated within this dataset. The dataset is generated by a data model that pulls from multiple raw Accela sources and joins them together.

    C. UPDATE PROCESS This dataset is updated daily overnight.

    D. HOW TO USE THIS DATASET Use this dataset to analyze project level timelines for planning projects or to calculate summary metrics related to the planning review and approval processes. The review metric type is defined in the ‘project stage’ column. Note that multiple rounds of completeness check review and resubmission review may occur for a single Planning project. The ‘potential error’ column flags records where data entry errors are likely present. Filter out rows where a value is entered in this column before building summary statistics.

    E. RELATED DATASETS

  13. Planning Department Project Events (coming soon)
  14. Planning Department Projects (coming soon) Building Permits Building Permit Application Issuance Metrics Building Permit Completeness Check Review Metrics Building Permit Application Review Metrics Planning Department Project Application Review Metrics

  • SEC Public Dataset

    • console.cloud.google.com
    Updated May 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=zh-cn (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=zh-cn
    Explore at:
    Dataset updated
    May 14, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.了解详情

  • CVE-ICU

    • kaggle.com
    zip
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CVEs (2025). CVE-ICU [Dataset]. https://www.kaggle.com/datasets/pymlkit/cve-icu/suggestions
    Explore at:
    zip(9958823 bytes)Available download formats
    Dataset updated
    Sep 20, 2025
    Authors
    CVEs
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CVE-ICU is a research project that automatically pulls all CVE data from the NVD and performs fundamental data analysis and graphing.

  • g

    Data from: Semantic Query Analysis from the Global Science Gateway

    • datasearch.gesis.org
    • ssh.datastations.nl
    Updated Jan 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlesi, Dr. C. (Istituto di Scienze e Tecnologie dell’informazione “A. Faedo”, CNR-ISTI, Italy), DataCollector (2020). Semantic Query Analysis from the Global Science Gateway [Dataset]. http://doi.org/10.17026/dans-25m-fhe2
    Explore at:
    Dataset updated
    Jan 23, 2020
    Dataset provided by
    DANS (Data Archiving and Networked Services)
    Authors
    Carlesi, Dr. C. (Istituto di Scienze e Tecnologie dell’informazione “A. Faedo”, CNR-ISTI, Italy), DataCollector
    Description

    Nowadays web portals play an essential role in searching and retrieving information in the several fields of knowledge: they are ever more technologically advanced and designed for supporting the storage of a huge amount of information in natural language originating from the queries launched by users worldwide. A good example is given by the WorldWideScience search engine: The database is available at http://worldwidescience.org/. It is based on a similar gateway, Science.gov, which is the major path to U.S. government science information, as it pulls together Web-based resources from various agencies. The information in the database is intended to be of high quality and authority, as well as the most current available from the participating countries in the Alliance, so users will find that the results will be more refined than those from a general search of Google. It covers the fields of medicine, agriculture, the environment, and energy, as well as basic sciences. Most of the information may be obtained free of charge (the database itself may be used free of charge) and is considered ‘‘open domain.’’ As of this writing, there are about 60 countries participating in WorldWideScience.org, providing access to 50+databases and information portals. Not all content is in English. (Bronson, 2009) Given this scenario, we focused on building a corpus constituted by the query logs registered by the GreyGuide: Repository and Portal to Good Practices and Resources in Grey Literature and received by the WorldWideScience.org (The Global Science Gateway) portal: the aim is to retrieve information related to social media which as of today represent a considerable source of data more and more widely used for research ends. This project includes eight months of query logs registered between July 2017 and February 2018 for a total of 445,827 queries. The analysis mainly concentrates on the semantics of the queries received from the portal clients: it is a process of information retrieval from a rich digital catalogue whose language is dynamic, is evolving and follows – as well as reflects – the cultural changes of our modern society.

  • c

    Redfin usa properties dataset

    • crawlfeeds.com
    csv, zip
    Updated Jun 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Redfin usa properties dataset [Dataset]. https://crawlfeeds.com/datasets/redfin-usa-properties-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 13, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Explore the Redfin USA Properties Dataset, available in CSV format. This extensive dataset provides valuable insights into the U.S. real estate market, including detailed property listings, prices, property types, and more across various states and cities. Perfect for those looking to conduct in-depth market analysis, real estate investment research, or financial forecasting.

    Key Features:

    • Comprehensive Property Data: Includes essential details such as listing prices, property types, square footage, and the number of bedrooms and bathrooms.
    • Geographic Coverage: Encompasses a wide range of U.S. states and cities, providing a broad view of the national real estate market.
    • Historical Trends: Analyze past market data to understand price movements, regional differences, and market trends over time.
    • Geo-Location Details: Enables spatial analysis and mapping by including precise geographical coordinates of properties.

    Who Can Benefit From This Dataset:

    • Real Estate Investors: Identify lucrative opportunities by analyzing property values, market trends, and regional price variations.
    • Market Analysts: Gain a deeper understanding of the U.S. housing market dynamics to inform research and reporting.
    • Data Scientists and Researchers: Leverage detailed real estate data for modeling, urban studies, or economic analysis.
    • Financial Analysts: Utilize the dataset for financial modeling, helping to predict market behavior and assess investment risks.

    Download the Redfin USA Properties Dataset to access essential information on the U.S. housing market, ideal for professionals in real estate, finance, and data analytics. Unlock key insights to make informed decisions in a dynamic market environment.

    Looking for deeper insights or a custom data pull from Redfin?
    Send a request with just one click and explore detailed property listings, price trends, and housing data.
    🔗 Request Redfin Real Estate Data

  • EZH2 RNA pull down_raw data

    • figshare.com
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhijian Zhu (2025). EZH2 RNA pull down_raw data [Dataset]. http://doi.org/10.6084/m9.figshare.29220899.v1
    Explore at:
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Zhijian Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    EZH2 is essential for the development of germinal center B cells. The expression of EZH2 is regulated by alternative splicing. However, the regulatory factors that regulate EZH2 alternative splicing in B cells are still unclear. We used a key intron of EZH2 to perform RNA pull-down combined with protein profiling analysis to try to find more regulatory factors of EZH2 alternative splicing.

  • v

    Global export data of Cabinet Pulls

    • volza.com
    csv
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2025). Global export data of Cabinet Pulls [Dataset]. https://www.volza.com/exports-global/global-export-data-of-cabinet+pulls
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of exporters, Sum of export value, 2014-01-01/2021-09-30, Count of export shipments
    Description

    16084 Global export shipment records of Cabinet Pulls with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

  • v

    Global import data of Cabinet Pulls

    • volza.com
    csv
    Updated Nov 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2025). Global import data of Cabinet Pulls [Dataset]. https://www.volza.com/imports-france/france-import-data-of-cabinet+pulls-from-na
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 14, 2025
    Dataset authored and provided by
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
    Description

    48 Global import shipment records of Cabinet Pulls with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.

  • McDonald's Stock Daily Updated

    • kaggle.com
    zip
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Hidden Layer (2025). McDonald's Stock Daily Updated [Dataset]. https://www.kaggle.com/datasets/isaaclopgu/mcdonalds-stock-daily-updated
    Explore at:
    zip(571304 bytes)Available download formats
    Dataset updated
    Nov 11, 2025
    Authors
    The Hidden Layer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About this Dataset

    This dataset offers a comprehensive, up-to-date look at the historical stock performance of McDonald's Corporation (MCD), one of the world's most recognizable and valuable fast-food brands.

    About the Company

    McDonald's Corporation is an American multinational fast-food company founded in 1940. Headquartered in Chicago, Illinois, the company is the world's largest restaurant chain by revenue, with a presence in over 100 countries. McDonald's is a major component of the Dow Jones Industrial Average and the S&P 500, making its stock a key indicator for the health of the consumer discretionary sector and global consumer spending.

    Key Features

    Daily OHLCV Data: The dataset contains essential Open, High, Low, Close, and Volume metrics for each trading day.

    Comprehensive History: Includes data from McDonald's early trading history to the present, offering a long-term perspective.

    Regular Updates: The dataset is designed for regular, automated updates to ensure data freshness for time-sensitive projects.

    Data Dictionary Date: The date of the trading session in YYYY-MM-DD format.

    ticker: The standard ticker symbol for McDonald's Corporation on the NYSE: 'MCD'.

    name: The full name of the company: 'McDonald's Corporation'.

    Open: The stock price in USD at the start of the trading session.

    High: The highest price reached during the trading day in USD.

    Low: The lowest price recorded during the trading day in USD.

    Close: The final stock price at market close in USD.

    Volume: The total number of shares traded on that day.

    Data Collection

    The data for this dataset is collected using the yfinance Python library, which pulls information directly from the Yahoo Finance API.

    Potential Use Cases

    Financial Analysis: Analyze historical price trends, volatility, and trading volume of McDonald's stock.

    Machine Learning: Develop and test models for stock price prediction and time series forecasting.

    Comparative Analysis: Compare McDonald's performance with other companies in the sector.

    Educational Projects: A perfect real-world dataset for students and data enthusiasts to practice data cleaning, visualization, and modeling.

  • Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhiraj (2024). BBC NEWS SUMMARY(CSV FORMAT) [Dataset]. https://www.kaggle.com/datasets/dignity45/bbc-news-summarycsv-format
    Organization logo

    BBC NEWS SUMMARY(CSV FORMAT)

    Text Summarization On BBC NEWS Article

    Explore at:
    zip(2097600 bytes)Available download formats
    Dataset updated
    Sep 9, 2024
    Authors
    Dhiraj
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description: Text Summarization Dataset

    This dataset is designed for users aiming to train models for text summarization. It contains 2,225 rows of data with two columns: "Text" and "Summary". Each row features a detailed news article or piece of text paired with its corresponding summary, providing a rich resource for developing and fine-tuning summarization algorithms.

    Key Features:

    • Text: Full-length articles or passages that serve as the input for summarization.
    • Summary: Concise summaries of the articles, which are ideal for training models to generate brief, coherent summaries from longer texts.

    Future Enhancements:

    This evolving dataset is planned to include additional features, such as text class labels, in future updates. These enhancements will provide more context and facilitate the development of models that can perform summarization across different categories of news content.

    Usage:

    Ideal for researchers and developers focused on text summarization tasks, this dataset enables the training of models to effectively compress information while retaining the essence of the original content.

    Acknowledgment

    We would like to extend our sincere gratitude to the dataset creator for their contribution to this valuable resource. This dataset, sourced from the BBC News Summary dataset on Kaggle, was created by Pariza. Their work has provided an invaluable asset for those working on text summarization tasks, and we appreciate their efforts in curating and sharing this data with the community.

    Thank you for supporting research and development in the field of natural language processing!

    File Description

    This script processes and consolidates text data from various directories containing news articles and their corresponding summaries. It reads the files from specified folders, handles encoding issues, and then creates a DataFrame that is saved as a CSV file for further analysis.

    Key Components:

    1. Imports:

      • numpy (np): Numerical operations library, though it's not used in this script.
      • pandas (pd): Data manipulation and analysis library.
      • os: For interacting with the operating system, e.g., building file paths.
      • glob: For file pattern matching and retrieving file paths.
    2. Function: get_texts

      • Parameters:
        • text_folders: List of folders containing news article text files.
        • text_list: List to store the content of text files.
        • summ_folder: List of folders containing summary text files.
        • sum_list: List to store the content of summary files.
        • encodings: List of encodings to try for reading files.
      • Purpose:
        • Reads text files from specified folders, handles different encodings, and appends the content to text_list and sum_list.
        • Returns the updated lists of texts and summaries.
    3. Data Preparation:

      • text_folder: List of directories for news articles.
      • summ_folder: List of directories for summaries.
      • text_list and summ_list: Initialize empty lists to store the contents.
      • data_df: Empty DataFrame to store the final data.
    4. Execution:

      • Calls get_texts function to populate text_list and summ_list.
      • Creates a DataFrame data_df with columns 'Text' and 'Summary'.
      • Saves data_df to a CSV file at /kaggle/working/bbc_news_data.csv.
    5. Output:

      • Prints the first few entries of the DataFrame to verify the content.

    Column Descriptions:

    • Text: Contains the full-length articles or passages of news content. This column is used as the input for summarization models.
    • Summary: Contains concise summaries of the corresponding articles in the "Text" column. This column is used as the target output for summarization models.

    Usage:

    • This script is designed to be run in a Kaggle environment where paths to text data are predefined.
    • It is intended for preprocessing and saving text data from news articles and summaries for subsequent analysis or model training.
    Search
    Clear search
    Close search
    Google apps
    Main menu