19 datasets found

f
Yahoo Password Frequency Corpus
figshare.com
application/gzip
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Bonneau (2023). Yahoo Password Frequency Corpus [Dataset]. http://doi.org/10.6084/m9.figshare.2057937.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2057937.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Joseph Bonneau
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes sanitized password frequency lists collected from Yahoo inMay 2011. For details of the original collection experiment, please see:Bonneau, Joseph. "The science of guessing: analyzing an anonymized corpus of 70 million passwords." IEEE Symposium on Security & Privacy, 2012.http://www.jbonneau.com/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdfThis data has been modified to preserve differential privacy. For details ofthis modification, please see:Jeremiah Blocki, Anupam Datta and Joseph Bonneau. "Differentially Private Password Frequency Lists." Network & Distributed Systems Symposium (NDSS), 2016.http://www.jbonneau.com/doc/BDB16-NDSS-pw_list_differential_privacy.pdfEach of the 51 .txt files represents one subset of all users' passwords observedduring the experiment period. "yahoo-all.txt" includes all users; every otherfile represents a strict subset of that group.Each file is a series of lines of the format:FREQUENCY #OBSERVATIONS...with FREQUENCY in descending order. For example, the file:3 12 11 3would represent a the frequency list (3, 2, 1, 1, 1), that is, one passwordobserved 3 times, one observed twice, and three separate passwords observedonce each.
Bitcoin Historical Data (2014-2025) Yahoo! Finance
kaggle.com
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eldintaro Farrandi (2025). Bitcoin Historical Data (2014-2025) Yahoo! Finance [Dataset]. https://www.kaggle.com/datasets/eldintarofarrandi/bitcoin-historical-data-2014-2025-yahoo-finance
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Eldintaro Farrandi
Description
This dataset includes daily historical price data for Bitcoin (BTC-USD) from 2014 to 2025, obtained through web scraping from the Yahoo Finance page using Selenium. The primary data source can be accessed at Yahoo Finance - Bitcoin Historical Data . The dataset contains daily information such as opening price (Open), highest price (High), lowest price (Low), closing price (Close), adjusted closing price (Adj Close), and trading volume (Volume).

About Bitcoin: Bitcoin (BTC) is the world's first decentralized digital currency, introduced in 2009 by an anonymous creator known as Satoshi Nakamoto. It operates on a peer-to-peer network powered by blockchain technology, enabling secure, transparent, and trustless transactions without the need for intermediaries like banks. Bitcoin's limited supply of 21 million coins and its growing adoption have made it a popular asset for investment, trading, and as a hedge against inflation.

We are excited to share this dataset and look forward to seeing the insights it can provide. We hope it will inspire collaboration and innovation within the community. By leveraging this daily data, we can explore trends, develop predictive models, and design innovative trading strategies that deepen our understanding of Bitcoin's market behavior. Together, we can unlock new opportunities and contribute to the collective advancement of cryptocurrency research and analysis.
h
commoncatalog-cc-by-nc
huggingface.co
Updated May 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CommonCanvas (2024). commoncatalog-cc-by-nc [Dataset]. https://huggingface.co/datasets/common-canvas/commoncatalog-cc-by-nc
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2024
Dataset authored and provided by
CommonCanvas
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for CommonCatalog CC-BY-NC

This dataset is a large collection of high-resolution Creative Common images (composed of different licenses, see paper Table 1 in the Appendix) collected in 2014 from users of Yahoo Flickr. The dataset contains images of up to 4k resolution, making this one of the highest resolution captioned image datasets.

Dataset Details Dataset Description

We provide captions synthetic captions to approximately 100 million high… See the full description on the dataset page: https://huggingface.co/datasets/common-canvas/commoncatalog-cc-by-nc.
Zephyrhills Od Dataset
universe.roboflow.com
zip
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jkesguerra2050@yahoo.com (2022). Zephyrhills Od Dataset [Dataset]. https://universe.roboflow.com/jkesguerra2050-yahoo-com/zephyrhills-od-kqxrt
Explore at:
zipAvailable download formats
Dataset updated
May 5, 2022
Dataset provided by
Yahoo!https://tw.yahoo.com/
Authors
jkesguerra2050@yahoo.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Zephyrhills
Variables measured
Bottles Bounding Boxes
Description
Here are a few use cases for this project:

Recycling Initiatives: This model can be used in smart waste segregation systems to automatically identify and sort different types of plastic bottles, cans, and other recyclables. This could save significant manual labor and increase overall recycling efficiency.

Retail Inventory Management: The model could be used in supermarkets or stores to autonomously monitor their inventory. By identifying different types of bottles and other items, the system could keep track of what's in stock and needs replenishment, especially within grocery stores or beverage industry retailers.

Pollution Monitoring: Environmental organizations could use this model for monitoring plastic pollution in public spaces, oceans, or beaches. By recognizing specific brands and kinds of bottles, data could be accumulated to hold companies accountable for their environmental footprints.

Brand Strategy Analysis: Companies could use this model to analyze the presence and positioning of their products in various scenarios (like events, homes, public spaces). They could track consumption patterns, target demographics, and even assess the impact of branding campaigns.

Customized Beverage Vending Machines: Vending machines could use this model to provide a unique user experience. Instead of standard buttons, users could hold up the bottle or can they want, and the machine could recognize the object and dispense the corresponding beverage.
Number of data compromises and impacted individuals in U.S. 2005-2024
statista.com
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
Explore at:
Dataset updated
Jul 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
All-time biggest online data breaches 2025
statista.com
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). All-time biggest online data breaches 2025 [Dataset]. https://www.statista.com/statistics/290525/cyber-crime-biggest-online-data-breaches-worldwide/
Explore at:
Dataset updated
May 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2025
Area covered
Worldwide
Description
The largest reported data leakage as of January 2025 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.

Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.
e
Understanding the Psychology of Guilt – Why Do People (not) Share Guilt with...
b2find.eudat.eu
Updated Sep 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Understanding the Psychology of Guilt – Why Do People (not) Share Guilt with Others? – Chapter 5 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/18dca9b7-91ea-585c-841b-ae81502adeca
Explore at:
Dataset updated
Sep 6, 2022
Description
Do people share their feelings of guilt with others and, if so, what are the reasons for doing or not doing this? Even though social sharing of negative emotional experiences, such as regret, has been extensively studied, not much is known on the sharing of guilt. We report three studies on the sharing of guilt. In Study 1, we re-analyzed data about sharing guilt experiences posted on a social website called “Yahoo Answers,” and found that people share intrapersonal as well as interpersonal guilt experiences with others online. Study 2 found that the main motivations of sharing guilt (compared with the sharing of regret) were “venting”, “clarification and meaning”, and “gaining advice”. Study 3 found that people were more likely to share experiences of interpersonal guilt and more likely to keep experiences of intrapersonal guilt to themselves. Together, these studies contribute to the understanding of the social sharing of the emotion guilt. Additional documentation and metadata can be found in the files Data Report Chapter 5XLZ.pdf, Documentation of all author responsibilities.pdf, and the metadata files in the rawdata folders. This research has preregistered all materials, hypothesis and sample size through: https://aspredicted.org/blind.php?x=md5f3b (For Study 2); https://aspredicted.org/blind.php?x=ay7vk9 (For Study 3). The present data package includes Raw data files (Raw data + metadata information, both in EXCEL), Syntax file (SPSS) and Materials (questionnaires in pdf from MTurk).
$TQQQ 5 Year Data
kaggle.com
Updated May 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bret Mathyer (2023). $TQQQ 5 Year Data [Dataset]. https://www.kaggle.com/datasets/bretmathyer/tqqq-5-year-data-yahoo-finance
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bret Mathyer
Description
$TQQQ is a fund that tracks 3x the NASDAQ.

Data from May 28, 2018 - May 28, 2023

TQQQ

Data has 7 columns: Date, Open, High, Low, Close, Adj. Close, Volume.

Data rows that may interfere are dividends and stock splits.

TQQQ_Dividends

Data has 2 columns: Date, Dividends.

TQQQ_Stock_Splits

Data has 2 columns: Data, Stock Splits.
Alpha Insights: US Funds
kaggle.com
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
willian oliveira gibin (2024). Alpha Insights: US Funds [Dataset]. http://doi.org/10.34740/kaggle/dsv/7614015
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7614015
Dataset updated
Feb 12, 2024
Dataset provided by
Kaggle
Authors
willian oliveira gibin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F2b87409e296a59d20dab602e6501f340%2Ffile9e063b84e35.gif?generation=1707771596337465&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F9d574862156fdd14299b6bcdf1d7c0e8%2Ffile9e048912e2.gif?generation=1707771713059014&alt=media" alt="">

US Funds Dataset: Unlocking Insights for Informed Investment Decisions

Exchange-Traded Funds (ETFs) have gained significant popularity in recent years as a low-cost alternative to Mutual Funds. This dataset, compiled from Yahoo Finance, offers a comprehensive overview of the US funds market, encompassing 23,783 Mutual Funds and 2,310 ETFs.

Data

The dataset provides a wealth of information on each fund, including:

General fund aspects: total net assets, fund family, inception date, expense ratios, and more. Portfolio indicators: cash allocation, sector weightings, holdings diversification, and other key metrics. Historical returns: year-to-date, 1-year, 3-year, and other performance data for different time periods. Financial ratios: price/earnings ratio, Treynor and Sharpe ratios, alpha, beta, and ESG scores. Applications

This dataset can be leveraged by investors, researchers, and financial professionals for a variety of purposes, including:

Investment analysis: comparing the performance and characteristics of Mutual Funds and ETFs to make informed investment decisions. Portfolio construction: using the data to build diversified portfolios that align with investment goals and risk tolerance. Research and analysis: studying market trends, fund behavior, and other factors to gain insights into the US funds market. Inspiration and Updates

The dataset was inspired by the surge of interest in ETFs in 2017 and the subsequent shift away from Mutual Funds. The data is sourced from Yahoo Finance, a publicly available website, ensuring transparency and accessibility. Updates are planned every 1-2 semesters to keep the data current and relevant.

Conclusion

This comprehensive dataset offers a valuable resource for anyone seeking to gain a deeper understanding of the US funds market. By providing detailed information on a wide range of funds, the dataset empowers investors to make informed decisions and build successful investment portfolios.

Access the dataset and unlock the insights it offers to make informed investment decisions.
US Funds dataset from Yahoo Finance
kaggle.com
zip
Updated Oct 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefano Leone (2021). US Funds dataset from Yahoo Finance [Dataset]. https://www.kaggle.com/datasets/stefanoleone992/mutual-funds-and-etfs/versions/3/code
Explore at:
zip(361045735 bytes)Available download formats
Dataset updated
Oct 15, 2021
Authors
Stefano Leone
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

ETFs represent a cheap alternative to Mutual Funds and they are growing fast in the last decade. Is the 2017 hype around ETFs confirmed by good returns in 2018? Updated version relates to the October 2021 financial values.

Content

The file contains 24,821 Mutual Funds and 1,680 ETFs with general aspects (as Total Net Assets, management company and size), portfolio indicators (as cash, stocks, bonds, and sectors), returns (as year_to_date, 2020-11) and financial ratios (as price/earning, Treynor and Sharpe ratios, alpha, and beta).

Acknowledgements

Data has been scraped from the publicly available website https://finance.yahoo.com.

Inspiration

Datasets allow for multiple comparisons regarding portfolio decisions from investment managers in Mutual Funds and portfolio restrictions to the indexes in ETFs. The inspiration comes from the 2017 hype regarding ETFs, that convinced many investors to buy shares of Exchange Traded Funds rather than Mutual Funds. Datasets will be updated every one or two semesters, hopefully with additional information scraped from Morningstar.com.
General Electric Stock Company Dataset
kaggle.com
Updated Aug 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Achmad Bauravindah (2022). General Electric Stock Company Dataset [Dataset]. https://www.kaggle.com/datasets/achmadbauravindah/general-electric-company-dataset/versions/2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Achmad Bauravindah
Description
Data from General Electric Company, the data has been recorded by Yahoo, You can use this data for regression problem.
Stock Market Dataset
kaggle.com
zip
Updated Apr 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oleh Onyshchak (2020). Stock Market Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1054465
Explore at:
zip(547714524 bytes)Available download formats
Unique identifier
https://doi.org/10.34740/kaggle/dsv/1054465
Dataset updated
Apr 2, 2020
Authors
Oleh Onyshchak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.

It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.

Data Structure

The date for every symbol is saved in CSV format with common fields:

Date - specifies trading date

Open - opening price

High - maximum price during the day

Low - minimum price during the day

Close - close price adjusted for splits

Adj Close - adjusted close price adjusted for both dividends and splits.

Volume - the number of shares that changed hands during a given day

All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv contains some additional metadata for each ticker such as full name.
Top 3000+ Cryptocurrency Dataset
kaggle.com
Updated Apr 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Top 3000+ Cryptocurrency Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/cryptocurrency-dataset-2021-395-types-of-crypto
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 9, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

A cryptocurrency, crypto-currency, or crypto is a collection of binary data which is designed to work as a medium of exchange. Individual coin ownership records are stored in a ledger, which is a computerized database using strong cryptography to secure transaction records, to control the creation of additional coins, and to verify the transfer of coin ownership. Cryptocurrencies are generally fiat currencies, as they are not backed by or convertible into a commodity. Some crypto schemes use validators to maintain the cryptocurrency. In a proof-of-stake model, owners put up their tokens as collateral. In return, they get authority over the token in proportion to the amount they stake. Generally, these token stakes get additional ownership in the token overtime via network fees, newly minted tokens, or other such reward mechanisms.

Cryptocurrency does not exist in physical form (like paper money) and is typically not issued by a central authority. Cryptocurrencies typically use decentralized control as opposed to a central bank digital currency (CBDC). When a cryptocurrency is minted or created prior to issuance or issued by a single issuer, it is generally considered centralized. When implemented with decentralized control, each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database

A cryptocurrency is a tradable digital asset or digital form of money, built on blockchain technology that only exists online. Cryptocurrencies use encryption to authenticate and protect transactions, hence their name. There are currently over a thousand different cryptocurrencies in the world, and many see them as the key to a fairer future economy.

Bitcoin, first released as open-source software in 2009, is the first decentralized cryptocurrency. Since the release of bitcoin, many other cryptocurrencies have been created.

Content

This Dataset is a collection of records of 3000+ Different Cryptocurrencies. * Top 395+ from 2021 * Top 3000+ from 2023

Structure of the Dataset

https://i.imgur.com/qGVJaHl.png" alt="">

Acknowledgements

This Data is collected from: https://finance.yahoo.com/. If you want to learn more, you can visit the Website.

Cover Photo by Worldspectrum: https://www.pexels.com/photo/ripple-etehereum-and-bitcoin-and-micro-sdhc-card-844124/
Sentiment Analysis on Financial Tweets
kaggle.com
zip
Updated Sep 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivek Rathi (2019). Sentiment Analysis on Financial Tweets [Dataset]. https://www.kaggle.com/datasets/vivekrathi055/sentiment-analysis-on-financial-tweets
Explore at:
zip(2538259 bytes)Available download formats
Dataset updated
Sep 5, 2019
Authors
Vivek Rathi
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.

Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.

"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.

I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."

Content

This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'

Acknowledgements

The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot

Inspiration

I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)
Meta updated stocks complete dataset
kaggle.com
Updated Mar 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Atif Latif (2025). Meta updated stocks complete dataset [Dataset]. https://www.kaggle.com/datasets/matiflatif/meta-stocks-complete-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
M Atif Latif
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset contains daily stock data for Meta Platforms, Inc. (META), formerly Facebook Inc., from May 19, 2012, to January 20, 2025. It offers a comprehensive view of Meta’s stock performance and market fluctuations during a period of significant growth, acquisitions, and technological advancements. This dataset is valuable for financial analysis, market prediction, machine learning projects, and evaluating the impact of Meta’s business decisions on its stock price.

Content

The dataset includes the following key features:

Open: Stock price at the start of the trading day. High: Highest stock price during the trading day. Low: Lowest stock price during the trading day. Close: Stock price at the end of the trading day. Adj Close: Adjusted closing price, accounting for corporate actions like stock splits, dividends, and other financial adjustments. Volume: Total number of shares traded during the trading day.

Variables

Date: The date of the trading day, formatted as YYYY-MM-DD. Open: The stock price at the start of the trading day. High: The highest price reached by the stock during the trading day. Low: The lowest price reached by the stock during the trading day. Close: The stock price at the end of the trading day. Adj Close: The adjusted closing price, which reflects corporate actions like stock splits and dividend payouts. Volume: The total number of shares traded on that specific day.

Acknowledgements

This dataset was sourced from reliable public APIs such as Yahoo Finance or Alpha Vantage. It is provided for educational and research purposes and is not affiliated with Meta Platforms, Inc. Users are encouraged to adhere to the terms of use of the original data provider.
Berkshire Hathaway - Stock - Latest and Updated
kaggle.com
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kalilur Rahman (2025). Berkshire Hathaway - Stock - Latest and Updated [Dataset]. https://www.kaggle.com/datasets/kalilurrahman/berkshire-hathaway-stock-latest-and-updated/versions/184
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 5, 2025
Dataset provided by
Kaggle
Authors
Kalilur Rahman
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTdaUO-YhK8CrvMPmhrVjs_5dc-qNrpZsb-d6MHT1z4WCfgcME5BhE49Gc6oGlvd8vfHts&usqp=CAU" alt="">

Berkshire Hathaway Inc. is an American multinational conglomerate holding company headquartered in Omaha, Nebraska, United States. The company wholly owns GEICO, Duracell, Dairy Queen, BNSF, Lubrizol, Fruit of the Loom, Helzberg Diamonds, Long & Foster, FlightSafety International, Shaw Industries, Pampered Chef, Forest River, and NetJets, and also owns 38.6% of Pilot Flying J; and significant minority holdings in public companies Kraft Heinz Company, American Express, The Coca-Cola Company, Bank of America, and Apple. Beginning in 2016, the company acquired large holdings in the major US airline carriers, namely United Airlines, Delta Air Lines, Southwest Airlines, and American Airlines, but sold all of its airline holdings early in 2020. Berkshire Hathaway has averaged an annual growth in book value of 19.0% to its shareholders since 1965, while employing large amounts of capital, and minimal debt. The company is known for its control and leadership by Warren Buffett, who serves as chairman and chief executive, and Charlie Munger, the company's vice chairman.
S&P 500 Daily Data (1927-12-30 to 2021-09-19)
kaggle.com
Updated Sep 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Myungchan Kim (2021). S&P 500 Daily Data (1927-12-30 to 2021-09-19) [Dataset]. https://www.kaggle.com/datasets/myungchankim/sp-500-daily-data-19281230-to-20210919
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 19, 2021
Dataset provided by
Kaggle
Authors
Myungchan Kim
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
What is the data?

Used data from Yahoo Finance to get daily data for Opening & Closing Price, Highest & Lowest Prices, Volume of the S&P 500 index.

How was the dataset compiled?

Code: Github Used the yfinance library (github) to import data from yahoo finance directly. Some processing of data was done.

Quality of data

All but a few open prices were missing between 1962-01-01 and 1982-04-10. For these, it was assumed that open price is equal to closing price of previous trading day.

Volume figures until 1949-12-13 are not available.

Some earlier years have less than expected calendar dates | Year with less than expected trading days| Number of Trading Days Recorded | | ---| --- | |1927| 1 | |1928| 195 | | 1929 | 199 | | 1930 | 155 | | 1931 | 183 | | 1932 | 169 | | 1933 | 136 | | 1934 | 91 | | 1935 | 83 | | 1936 | 107 | | 1937 | 83 | | 1938 | 57 | | 1939 | 27 | | 1940 | 8 | | 1941 | 6 | | 1942 | 16 | | 1943 | 7 | | 1944 | 6 | | 1945 | 42 | | 1946 | 48 | | 1947 | 18 | | 1948 | 16 | | 1949 | 1 | | 1968 | 226 |

Added columns for:

1. percentage Gain/Loss (calculated by taking the percentage difference between closing prices of 2 consecutive trading days) 2. price variation percentage: (High-Low)/Closing
yahoo_finance_data_nse_2000_stocks
kaggle.com
zip
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stormblessed_Ash (2025). yahoo_finance_data_nse_2000_stocks [Dataset]. https://www.kaggle.com/datasets/ashvinvinodh97/yahoo-finance-data-nse-2000-stocks
Explore at:
zip(198144682 bytes)Available download formats
Dataset updated
Apr 11, 2025
Authors
Stormblessed_Ash
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This dataset contains daily OHLCV data for ~ 2000 Indian Stocks listed on the National Stock Exchange for all time. The columns are multi-index columns, so this needs to be taken into account when reading and using the data. Source : Yahoo Finance Type: All files are CSV format. Currency : INR

All the tickers have been collected from here : https://www.nseindia.com/market-data/securities-available-for-trading

If using pandas, the following function is a utility to read any of the CSV files: ``` import pandas as pd def read_ohlcv(filename): "read a given ohlcv data file downloaded from yfinance" return pd.read_csv( filename, skiprows=[0, 1, 2], # remove the multiindex rows that cause trouble names=["Date", "Close", "High", "Low", "Open", "Volume"], index_col="Date", parse_dates=["Date"], )

dataset = read_ohlcv("ABCAPITAL.NS.csv")
AMD and GOOGLE Stock Price
kaggle.com
zip
Updated May 12, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gunhee Park (2017). AMD and GOOGLE Stock Price [Dataset]. https://www.kaggle.com/gunhee/amdgoogle
Explore at:
zip(81525 bytes)Available download formats
Dataset updated
May 12, 2017
Authors
Gunhee Park
Description
Context

Analyzing stock price is interesting.

Content

Data from yahoo.com/finance AMD and GOOGLE historical price 5/22/2009 ~ 5/03/2017 daily price and volume. There are 7 columns; Date, open, high, low, close, volume, adj close (2001, 7) each of stock

Acknowledgements

Yahoo/finance

Inspiration

I want to find relationship between volume and price.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Joseph Bonneau (2023). Yahoo Password Frequency Corpus [Dataset]. http://doi.org/10.6084/m9.figshare.2057937.v1

Yahoo Password Frequency Corpus

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.2057937.v1

Dataset updated

May 30, 2023

Dataset provided by

figshare

Authors

Joseph Bonneau

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This dataset includes sanitized password frequency lists collected from Yahoo inMay 2011. For details of the original collection experiment, please see:Bonneau, Joseph. "The science of guessing: analyzing an anonymized corpus of 70 million passwords." IEEE Symposium on Security & Privacy, 2012.http://www.jbonneau.com/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdfThis data has been modified to preserve differential privacy. For details ofthis modification, please see:Jeremiah Blocki, Anupam Datta and Joseph Bonneau. "Differentially Private Password Frequency Lists." Network & Distributed Systems Symposium (NDSS), 2016.http://www.jbonneau.com/doc/BDB16-NDSS-pw_list_differential_privacy.pdfEach of the 51 .txt files represents one subset of all users' passwords observedduring the experiment period. "yahoo-all.txt" includes all users; every otherfile represents a strict subset of that group.Each file is a series of lines of the format:FREQUENCY #OBSERVATIONS...with FREQUENCY in descending order. For example, the file:3 12 11 3would represent a the frequency list (3, 2, 1, 1, 1), that is, one passwordobserved 3 times, one observed twice, and three separate passwords observedonce each.

Clear search

Close search

Google apps

Main menu

Yahoo Password Frequency Corpus

Bitcoin Historical Data (2014-2025) Yahoo! Finance

commoncatalog-cc-by-nc

Zephyrhills Od Dataset

Number of data compromises and impacted individuals in U.S. 2005-2024

All-time biggest online data breaches 2025

Understanding the Psychology of Guilt – Why Do People (not) Share Guilt with...

$TQQQ 5 Year Data

Alpha Insights: US Funds

US Funds Dataset: Unlocking Insights for Informed Investment Decisions

US Funds dataset from Yahoo Finance

Context

Content

Acknowledgements

Inspiration

General Electric Stock Company Dataset

Stock Market Dataset

Overview

Data Structure

Top 3000+ Cryptocurrency Dataset

Context

Content

Structure of the Dataset

Acknowledgements

Sentiment Analysis on Financial Tweets

Context

Content

Acknowledgements

Inspiration

Meta updated stocks complete dataset

Context

Content

Variables

Acknowledgements

Berkshire Hathaway - Stock - Latest and Updated

S&P 500 Daily Data (1927-12-30 to 2021-09-19)

What is the data?

How was the dataset compiled?

Quality of data

Added columns for:

yahoo_finance_data_nse_2000_stocks

dataset = read_ohlcv("ABCAPITAL.NS.csv")

AMD and GOOGLE Stock Price

Context

Content

Acknowledgements

Inspiration

Yahoo Password Frequency CorpusSee More Versions

Yahoo Password Frequency Corpus