19 datasets found
  1. f

    Yahoo Password Frequency Corpus

    • figshare.com
    application/gzip
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Bonneau (2023). Yahoo Password Frequency Corpus [Dataset]. http://doi.org/10.6084/m9.figshare.2057937.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Joseph Bonneau
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes sanitized password frequency lists collected from Yahoo inMay 2011. For details of the original collection experiment, please see:Bonneau, Joseph. "The science of guessing: analyzing an anonymized corpus of 70 million passwords." IEEE Symposium on Security & Privacy, 2012.http://www.jbonneau.com/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdfThis data has been modified to preserve differential privacy. For details ofthis modification, please see:Jeremiah Blocki, Anupam Datta and Joseph Bonneau. "Differentially Private Password Frequency Lists." Network & Distributed Systems Symposium (NDSS), 2016.http://www.jbonneau.com/doc/BDB16-NDSS-pw_list_differential_privacy.pdfEach of the 51 .txt files represents one subset of all users' passwords observedduring the experiment period. "yahoo-all.txt" includes all users; every otherfile represents a strict subset of that group.Each file is a series of lines of the format:FREQUENCY #OBSERVATIONS...with FREQUENCY in descending order. For example, the file:3 12 11 3would represent a the frequency list (3, 2, 1, 1, 1), that is, one passwordobserved 3 times, one observed twice, and three separate passwords observedonce each.

  2. Bitcoin Historical Data (2014-2025) Yahoo! Finance

    • kaggle.com
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eldintaro Farrandi (2025). Bitcoin Historical Data (2014-2025) Yahoo! Finance [Dataset]. https://www.kaggle.com/datasets/eldintarofarrandi/bitcoin-historical-data-2014-2025-yahoo-finance
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Eldintaro Farrandi
    Description

    This dataset includes daily historical price data for Bitcoin (BTC-USD) from 2014 to 2025, obtained through web scraping from the Yahoo Finance page using Selenium. The primary data source can be accessed at Yahoo Finance - Bitcoin Historical Data . The dataset contains daily information such as opening price (Open), highest price (High), lowest price (Low), closing price (Close), adjusted closing price (Adj Close), and trading volume (Volume).

    About Bitcoin: Bitcoin (BTC) is the world's first decentralized digital currency, introduced in 2009 by an anonymous creator known as Satoshi Nakamoto. It operates on a peer-to-peer network powered by blockchain technology, enabling secure, transparent, and trustless transactions without the need for intermediaries like banks. Bitcoin's limited supply of 21 million coins and its growing adoption have made it a popular asset for investment, trading, and as a hedge against inflation.

    We are excited to share this dataset and look forward to seeing the insights it can provide. We hope it will inspire collaboration and innovation within the community. By leveraging this daily data, we can explore trends, develop predictive models, and design innovative trading strategies that deepen our understanding of Bitcoin's market behavior. Together, we can unlock new opportunities and contribute to the collective advancement of cryptocurrency research and analysis.

  3. h

    commoncatalog-cc-by-nc

    • huggingface.co
    Updated May 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CommonCanvas (2024). commoncatalog-cc-by-nc [Dataset]. https://huggingface.co/datasets/common-canvas/commoncatalog-cc-by-nc
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 16, 2024
    Dataset authored and provided by
    CommonCanvas
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for CommonCatalog CC-BY-NC

    This dataset is a large collection of high-resolution Creative Common images (composed of different licenses, see paper Table 1 in the Appendix) collected in 2014 from users of Yahoo Flickr. The dataset contains images of up to 4k resolution, making this one of the highest resolution captioned image datasets.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    We provide captions synthetic captions to approximately 100 million high… See the full description on the dataset page: https://huggingface.co/datasets/common-canvas/commoncatalog-cc-by-nc.

  4. Zephyrhills Od Dataset

    • universe.roboflow.com
    zip
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jkesguerra2050@yahoo.com (2022). Zephyrhills Od Dataset [Dataset]. https://universe.roboflow.com/jkesguerra2050-yahoo-com/zephyrhills-od-kqxrt
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 5, 2022
    Dataset provided by
    Yahoo!https://tw.yahoo.com/
    Authors
    jkesguerra2050@yahoo.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Zephyrhills
    Variables measured
    Bottles Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Recycling Initiatives: This model can be used in smart waste segregation systems to automatically identify and sort different types of plastic bottles, cans, and other recyclables. This could save significant manual labor and increase overall recycling efficiency.

    2. Retail Inventory Management: The model could be used in supermarkets or stores to autonomously monitor their inventory. By identifying different types of bottles and other items, the system could keep track of what's in stock and needs replenishment, especially within grocery stores or beverage industry retailers.

    3. Pollution Monitoring: Environmental organizations could use this model for monitoring plastic pollution in public spaces, oceans, or beaches. By recognizing specific brands and kinds of bottles, data could be accumulated to hold companies accountable for their environmental footprints.

    4. Brand Strategy Analysis: Companies could use this model to analyze the presence and positioning of their products in various scenarios (like events, homes, public spaces). They could track consumption patterns, target demographics, and even assess the impact of branding campaigns.

    5. Customized Beverage Vending Machines: Vending machines could use this model to provide a unique user experience. Instead of standard buttons, users could hold up the bottle or can they want, and the machine could recognize the object and dispense the corresponding beverage.

  5. Number of data compromises and impacted individuals in U.S. 2005-2024

    • statista.com
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
    Explore at:
    Dataset updated
    Jul 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.

  6. All-time biggest online data breaches 2025

    • statista.com
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). All-time biggest online data breaches 2025 [Dataset]. https://www.statista.com/statistics/290525/cyber-crime-biggest-online-data-breaches-worldwide/
    Explore at:
    Dataset updated
    May 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2025
    Area covered
    Worldwide
    Description

    The largest reported data leakage as of January 2025 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.

    Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.

  7. e

    Understanding the Psychology of Guilt – Why Do People (not) Share Guilt with...

    • b2find.eudat.eu
    Updated Sep 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Understanding the Psychology of Guilt – Why Do People (not) Share Guilt with Others? – Chapter 5 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/18dca9b7-91ea-585c-841b-ae81502adeca
    Explore at:
    Dataset updated
    Sep 6, 2022
    Description

    Do people share their feelings of guilt with others and, if so, what are the reasons for doing or not doing this? Even though social sharing of negative emotional experiences, such as regret, has been extensively studied, not much is known on the sharing of guilt. We report three studies on the sharing of guilt. In Study 1, we re-analyzed data about sharing guilt experiences posted on a social website called “Yahoo Answers,” and found that people share intrapersonal as well as interpersonal guilt experiences with others online. Study 2 found that the main motivations of sharing guilt (compared with the sharing of regret) were “venting”, “clarification and meaning”, and “gaining advice”. Study 3 found that people were more likely to share experiences of interpersonal guilt and more likely to keep experiences of intrapersonal guilt to themselves. Together, these studies contribute to the understanding of the social sharing of the emotion guilt. Additional documentation and metadata can be found in the files Data Report Chapter 5XLZ.pdf, Documentation of all author responsibilities.pdf, and the metadata files in the rawdata folders. This research has preregistered all materials, hypothesis and sample size through: https://aspredicted.org/blind.php?x=md5f3b (For Study 2); https://aspredicted.org/blind.php?x=ay7vk9 (For Study 3). The present data package includes Raw data files (Raw data + metadata information, both in EXCEL), Syntax file (SPSS) and Materials (questionnaires in pdf from MTurk).

  8. $TQQQ 5 Year Data

    • kaggle.com
    Updated May 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bret Mathyer (2023). $TQQQ 5 Year Data [Dataset]. https://www.kaggle.com/datasets/bretmathyer/tqqq-5-year-data-yahoo-finance
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bret Mathyer
    Description

    $TQQQ is a fund that tracks 3x the NASDAQ.

    Data from May 28, 2018 - May 28, 2023

    TQQQ

    Data has 7 columns: Date, Open, High, Low, Close, Adj. Close, Volume.

    Data rows that may interfere are dividends and stock splits.

    TQQQ_Dividends

    Data has 2 columns: Date, Dividends.

    TQQQ_Stock_Splits

    Data has 2 columns: Data, Stock Splits.

  9. Alpha Insights: US Funds

    • kaggle.com
    Updated Feb 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira gibin (2024). Alpha Insights: US Funds [Dataset]. http://doi.org/10.34740/kaggle/dsv/7614015
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2024
    Dataset provided by
    Kaggle
    Authors
    willian oliveira gibin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F2b87409e296a59d20dab602e6501f340%2Ffile9e063b84e35.gif?generation=1707771596337465&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F9d574862156fdd14299b6bcdf1d7c0e8%2Ffile9e048912e2.gif?generation=1707771713059014&alt=media" alt="">

    US Funds Dataset: Unlocking Insights for Informed Investment Decisions

    Exchange-Traded Funds (ETFs) have gained significant popularity in recent years as a low-cost alternative to Mutual Funds. This dataset, compiled from Yahoo Finance, offers a comprehensive overview of the US funds market, encompassing 23,783 Mutual Funds and 2,310 ETFs.

    Data

    The dataset provides a wealth of information on each fund, including:

    General fund aspects: total net assets, fund family, inception date, expense ratios, and more. Portfolio indicators: cash allocation, sector weightings, holdings diversification, and other key metrics. Historical returns: year-to-date, 1-year, 3-year, and other performance data for different time periods. Financial ratios: price/earnings ratio, Treynor and Sharpe ratios, alpha, beta, and ESG scores. Applications

    This dataset can be leveraged by investors, researchers, and financial professionals for a variety of purposes, including:

    Investment analysis: comparing the performance and characteristics of Mutual Funds and ETFs to make informed investment decisions. Portfolio construction: using the data to build diversified portfolios that align with investment goals and risk tolerance. Research and analysis: studying market trends, fund behavior, and other factors to gain insights into the US funds market. Inspiration and Updates

    The dataset was inspired by the surge of interest in ETFs in 2017 and the subsequent shift away from Mutual Funds. The data is sourced from Yahoo Finance, a publicly available website, ensuring transparency and accessibility. Updates are planned every 1-2 semesters to keep the data current and relevant.

    Conclusion

    This comprehensive dataset offers a valuable resource for anyone seeking to gain a deeper understanding of the US funds market. By providing detailed information on a wide range of funds, the dataset empowers investors to make informed decisions and build successful investment portfolios.

    Access the dataset and unlock the insights it offers to make informed investment decisions.

  10. US Funds dataset from Yahoo Finance

    • kaggle.com
    zip
    Updated Oct 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Leone (2021). US Funds dataset from Yahoo Finance [Dataset]. https://www.kaggle.com/datasets/stefanoleone992/mutual-funds-and-etfs/versions/3/code
    Explore at:
    zip(361045735 bytes)Available download formats
    Dataset updated
    Oct 15, 2021
    Authors
    Stefano Leone
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    ETFs represent a cheap alternative to Mutual Funds and they are growing fast in the last decade. Is the 2017 hype around ETFs confirmed by good returns in 2018? Updated version relates to the October 2021 financial values.

    Content

    The file contains 24,821 Mutual Funds and 1,680 ETFs with general aspects (as Total Net Assets, management company and size), portfolio indicators (as cash, stocks, bonds, and sectors), returns (as year_to_date, 2020-11) and financial ratios (as price/earning, Treynor and Sharpe ratios, alpha, and beta).

    Acknowledgements

    Data has been scraped from the publicly available website https://finance.yahoo.com.

    Inspiration

    Datasets allow for multiple comparisons regarding portfolio decisions from investment managers in Mutual Funds and portfolio restrictions to the indexes in ETFs. The inspiration comes from the 2017 hype regarding ETFs, that convinced many investors to buy shares of Exchange Traded Funds rather than Mutual Funds. Datasets will be updated every one or two semesters, hopefully with additional information scraped from Morningstar.com.

  11. General Electric Stock Company Dataset

    • kaggle.com
    Updated Aug 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Achmad Bauravindah (2022). General Electric Stock Company Dataset [Dataset]. https://www.kaggle.com/datasets/achmadbauravindah/general-electric-company-dataset/versions/2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Achmad Bauravindah
    Description

    Data from General Electric Company, the data has been recorded by Yahoo, You can use this data for regression problem.

  12. Stock Market Dataset

    • kaggle.com
    zip
    Updated Apr 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleh Onyshchak (2020). Stock Market Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1054465
    Explore at:
    zip(547714524 bytes)Available download formats
    Dataset updated
    Apr 2, 2020
    Authors
    Oleh Onyshchak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.

    It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.

    Data Structure

    The date for every symbol is saved in CSV format with common fields:

    • Date - specifies trading date
    • Open - opening price
    • High - maximum price during the day
    • Low - minimum price during the day
    • Close - close price adjusted for splits
    • Adj Close - adjusted close price adjusted for both dividends and splits.
    • Volume - the number of shares that changed hands during a given day

    All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv contains some additional metadata for each ticker such as full name.

  13. Top 3000+ Cryptocurrency Dataset

    • kaggle.com
    Updated Apr 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Banerjee (2023). Top 3000+ Cryptocurrency Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/cryptocurrency-dataset-2021-395-types-of-crypto
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sourav Banerjee
    Description

    Context

    A cryptocurrency, crypto-currency, or crypto is a collection of binary data which is designed to work as a medium of exchange. Individual coin ownership records are stored in a ledger, which is a computerized database using strong cryptography to secure transaction records, to control the creation of additional coins, and to verify the transfer of coin ownership. Cryptocurrencies are generally fiat currencies, as they are not backed by or convertible into a commodity. Some crypto schemes use validators to maintain the cryptocurrency. In a proof-of-stake model, owners put up their tokens as collateral. In return, they get authority over the token in proportion to the amount they stake. Generally, these token stakes get additional ownership in the token overtime via network fees, newly minted tokens, or other such reward mechanisms.

    Cryptocurrency does not exist in physical form (like paper money) and is typically not issued by a central authority. Cryptocurrencies typically use decentralized control as opposed to a central bank digital currency (CBDC). When a cryptocurrency is minted or created prior to issuance or issued by a single issuer, it is generally considered centralized. When implemented with decentralized control, each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database

    A cryptocurrency is a tradable digital asset or digital form of money, built on blockchain technology that only exists online. Cryptocurrencies use encryption to authenticate and protect transactions, hence their name. There are currently over a thousand different cryptocurrencies in the world, and many see them as the key to a fairer future economy.

    Bitcoin, first released as open-source software in 2009, is the first decentralized cryptocurrency. Since the release of bitcoin, many other cryptocurrencies have been created.

    Content

    This Dataset is a collection of records of 3000+ Different Cryptocurrencies. * Top 395+ from 2021 * Top 3000+ from 2023

    Structure of the Dataset

    https://i.imgur.com/qGVJaHl.png" alt="">

    Acknowledgements

    This Data is collected from: https://finance.yahoo.com/. If you want to learn more, you can visit the Website.

    Cover Photo by Worldspectrum: https://www.pexels.com/photo/ripple-etehereum-and-bitcoin-and-micro-sdhc-card-844124/

  14. Sentiment Analysis on Financial Tweets

    • kaggle.com
    zip
    Updated Sep 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek Rathi (2019). Sentiment Analysis on Financial Tweets [Dataset]. https://www.kaggle.com/datasets/vivekrathi055/sentiment-analysis-on-financial-tweets
    Explore at:
    zip(2538259 bytes)Available download formats
    Dataset updated
    Sep 5, 2019
    Authors
    Vivek Rathi
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.

    Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.

    "I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.

    I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."

    Content

    This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'

    Acknowledgements

    The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot

    Inspiration

    I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)

  15. Meta updated stocks complete dataset

    • kaggle.com
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Atif Latif (2025). Meta updated stocks complete dataset [Dataset]. https://www.kaggle.com/datasets/matiflatif/meta-stocks-complete-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    M Atif Latif
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset contains daily stock data for Meta Platforms, Inc. (META), formerly Facebook Inc., from May 19, 2012, to January 20, 2025. It offers a comprehensive view of Meta’s stock performance and market fluctuations during a period of significant growth, acquisitions, and technological advancements. This dataset is valuable for financial analysis, market prediction, machine learning projects, and evaluating the impact of Meta’s business decisions on its stock price.

    Content

    The dataset includes the following key features:

    Open: Stock price at the start of the trading day. High: Highest stock price during the trading day. Low: Lowest stock price during the trading day. Close: Stock price at the end of the trading day. Adj Close: Adjusted closing price, accounting for corporate actions like stock splits, dividends, and other financial adjustments. Volume: Total number of shares traded during the trading day.

    Variables

    Date: The date of the trading day, formatted as YYYY-MM-DD. Open: The stock price at the start of the trading day. High: The highest price reached by the stock during the trading day. Low: The lowest price reached by the stock during the trading day. Close: The stock price at the end of the trading day. Adj Close: The adjusted closing price, which reflects corporate actions like stock splits and dividend payouts. Volume: The total number of shares traded on that specific day.

    Acknowledgements

    This dataset was sourced from reliable public APIs such as Yahoo Finance or Alpha Vantage. It is provided for educational and research purposes and is not affiliated with Meta Platforms, Inc. Users are encouraged to adhere to the terms of use of the original data provider.

  16. Berkshire Hathaway - Stock - Latest and Updated

    • kaggle.com
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kalilur Rahman (2025). Berkshire Hathaway - Stock - Latest and Updated [Dataset]. https://www.kaggle.com/datasets/kalilurrahman/berkshire-hathaway-stock-latest-and-updated/versions/184
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    Kaggle
    Authors
    Kalilur Rahman
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTdaUO-YhK8CrvMPmhrVjs_5dc-qNrpZsb-d6MHT1z4WCfgcME5BhE49Gc6oGlvd8vfHts&usqp=CAU" alt="">

    Berkshire Hathaway Inc. is an American multinational conglomerate holding company headquartered in Omaha, Nebraska, United States. The company wholly owns GEICO, Duracell, Dairy Queen, BNSF, Lubrizol, Fruit of the Loom, Helzberg Diamonds, Long & Foster, FlightSafety International, Shaw Industries, Pampered Chef, Forest River, and NetJets, and also owns 38.6% of Pilot Flying J; and significant minority holdings in public companies Kraft Heinz Company, American Express, The Coca-Cola Company, Bank of America, and Apple. Beginning in 2016, the company acquired large holdings in the major US airline carriers, namely United Airlines, Delta Air Lines, Southwest Airlines, and American Airlines, but sold all of its airline holdings early in 2020. Berkshire Hathaway has averaged an annual growth in book value of 19.0% to its shareholders since 1965, while employing large amounts of capital, and minimal debt. The company is known for its control and leadership by Warren Buffett, who serves as chairman and chief executive, and Charlie Munger, the company's vice chairman.

  17. S&P 500 Daily Data (1927-12-30 to 2021-09-19)

    • kaggle.com
    Updated Sep 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Myungchan Kim (2021). S&P 500 Daily Data (1927-12-30 to 2021-09-19) [Dataset]. https://www.kaggle.com/datasets/myungchankim/sp-500-daily-data-19281230-to-20210919
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 19, 2021
    Dataset provided by
    Kaggle
    Authors
    Myungchan Kim
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    What is the data?

    Used data from Yahoo Finance to get daily data for Opening & Closing Price, Highest & Lowest Prices, Volume of the S&P 500 index.

    How was the dataset compiled?

    Code: Github Used the yfinance library (github) to import data from yahoo finance directly. Some processing of data was done.

    Quality of data

    All but a few open prices were missing between 1962-01-01 and 1982-04-10. For these, it was assumed that open price is equal to closing price of previous trading day.

    Volume figures until 1949-12-13 are not available.

    Some earlier years have less than expected calendar dates | Year with less than expected trading days| Number of Trading Days Recorded | | ---| --- | |1927| 1 | |1928| 195 | | 1929 | 199 | | 1930 | 155 | | 1931 | 183 | | 1932 | 169 | | 1933 | 136 | | 1934 | 91 | | 1935 | 83 | | 1936 | 107 | | 1937 | 83 | | 1938 | 57 | | 1939 | 27 | | 1940 | 8 | | 1941 | 6 | | 1942 | 16 | | 1943 | 7 | | 1944 | 6 | | 1945 | 42 | | 1946 | 48 | | 1947 | 18 | | 1948 | 16 | | 1949 | 1 | | 1968 | 226 |

    Added columns for:

     1. percentage Gain/Loss (calculated by taking the percentage difference between closing prices of 2 consecutive trading days)
     2. price variation percentage: (High-Low)/Closing
    
  18. yahoo_finance_data_nse_2000_stocks

    • kaggle.com
    zip
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stormblessed_Ash (2025). yahoo_finance_data_nse_2000_stocks [Dataset]. https://www.kaggle.com/datasets/ashvinvinodh97/yahoo-finance-data-nse-2000-stocks
    Explore at:
    zip(198144682 bytes)Available download formats
    Dataset updated
    Apr 11, 2025
    Authors
    Stormblessed_Ash
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset contains daily OHLCV data for ~ 2000 Indian Stocks listed on the National Stock Exchange for all time. The columns are multi-index columns, so this needs to be taken into account when reading and using the data. Source : Yahoo Finance Type: All files are CSV format. Currency : INR

    All the tickers have been collected from here : https://www.nseindia.com/market-data/securities-available-for-trading

    If using pandas, the following function is a utility to read any of the CSV files: ``` import pandas as pd def read_ohlcv(filename): "read a given ohlcv data file downloaded from yfinance" return pd.read_csv( filename, skiprows=[0, 1, 2], # remove the multiindex rows that cause trouble names=["Date", "Close", "High", "Low", "Open", "Volume"], index_col="Date", parse_dates=["Date"], )

    dataset = read_ohlcv("ABCAPITAL.NS.csv")

  19. AMD and GOOGLE Stock Price

    • kaggle.com
    zip
    Updated May 12, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gunhee Park (2017). AMD and GOOGLE Stock Price [Dataset]. https://www.kaggle.com/gunhee/amdgoogle
    Explore at:
    zip(81525 bytes)Available download formats
    Dataset updated
    May 12, 2017
    Authors
    Gunhee Park
    Description

    Context

    Analyzing stock price is interesting.

    Content

    Data from yahoo.com/finance AMD and GOOGLE historical price 5/22/2009 ~ 5/03/2017 daily price and volume. There are 7 columns; Date, open, high, low, close, volume, adj close (2001, 7) each of stock

    Acknowledgements

    Yahoo/finance

    Inspiration

    I want to find relationship between volume and price.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Joseph Bonneau (2023). Yahoo Password Frequency Corpus [Dataset]. http://doi.org/10.6084/m9.figshare.2057937.v1

Yahoo Password Frequency Corpus

Explore at:
application/gzipAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Joseph Bonneau
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This dataset includes sanitized password frequency lists collected from Yahoo inMay 2011. For details of the original collection experiment, please see:Bonneau, Joseph. "The science of guessing: analyzing an anonymized corpus of 70 million passwords." IEEE Symposium on Security & Privacy, 2012.http://www.jbonneau.com/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdfThis data has been modified to preserve differential privacy. For details ofthis modification, please see:Jeremiah Blocki, Anupam Datta and Joseph Bonneau. "Differentially Private Password Frequency Lists." Network & Distributed Systems Symposium (NDSS), 2016.http://www.jbonneau.com/doc/BDB16-NDSS-pw_list_differential_privacy.pdfEach of the 51 .txt files represents one subset of all users' passwords observedduring the experiment period. "yahoo-all.txt" includes all users; every otherfile represents a strict subset of that group.Each file is a series of lines of the format:FREQUENCY #OBSERVATIONS...with FREQUENCY in descending order. For example, the file:3 12 11 3would represent a the frequency list (3, 2, 1, 1, 1), that is, one passwordobserved 3 times, one observed twice, and three separate passwords observedonce each.

Search
Clear search
Close search
Google apps
Main menu