30 datasets found
  1. Center for Research in Security Prices (CRSP) Stock Files

    • archive.ciser.cornell.edu
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Center for Research in Security Prices (2024). Center for Research in Security Prices (CRSP) Stock Files [Dataset]. https://archive.ciser.cornell.edu/studies/2191/project-description
    Explore at:
    Dataset updated
    Aug 7, 2024
    Dataset authored and provided by
    Center for Research in Security Prices
    Description

    The Center for Research in Security Prices (CRSP) stock databases provide time-series and event data on individual stocks, augmented with market time-series. Daily and monthly time-series variables include returns, closing, low bid and high ask prices, and trading volume. Event data includes distributions, shares outstanding, names, etc.

    Dataset is an external database available here for Cornell affiliates: https://johnson.library.cornell.edu/database/wharton-research-data-services-wrds/

  2. Data from: Center for Research in Security Prices

    • lseg.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LSEG (2024). Center for Research in Security Prices [Dataset]. https://www.lseg.com/en/data-analytics/financial-data/pricing-and-market-data/equities-market-data/center-for-research-in-security-prices
    Explore at:
    csv,html,json,python,sql,user interface,xmlAvailable download formats
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    London Stock Exchange Grouphttp://www.londonstockexchangegroup.com/
    Authors
    LSEG
    License

    https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer

    Description

    View LSEG's data from the Center for Research in Security Prices (CRSP), a leading provider of research, historical market data, and returns.

  3. H

    Global Experiment: CRSP Central Data Base Restored Files

    • dataverse.harvard.edu
    Updated Dec 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hillary Egna; Cole Ensminger (2020). Global Experiment: CRSP Central Data Base Restored Files [Dataset]. http://doi.org/10.7910/DVN/4EAJOA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Hillary Egna; Cole Ensminger
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1982 - 1996
    Description

    Dataset contains recovered and restored data from the CRSP Central Database presented in two file formats: SQL and TAB. The original CSV files were converted to TAB on file upload by Dataverse. You can download the original CSV files by clicking the Download button and selecting "Original File Format (Comma Separated Values)." Data may not have been recovered completely. The entirety of the Central Data Base had been maintained on servers at Oregon State University (USA) and the Asian Institute of Technology (Thailand) that were decommissioned over a decade ago. Recovery involved accessing data files available from the archived PD/A CRSP website; it is uncertain whether those files were the most recently updated files from the Central Data Base. Metadata publications (loaded in the dataset) contain specific experimental protocols, analytical methods, instructions to data collectors for data entry, data entry protocols, and other metadata for each experimental cycle at each research location in Asia, Africa, and Central America. This dataset contains 117 files.

  4. Data from: CRSP CRISPR Therapeutics AG Common Shares (Forecast)

    • kappasignal.com
    Updated Dec 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KappaSignal (2022). CRSP CRISPR Therapeutics AG Common Shares (Forecast) [Dataset]. https://www.kappasignal.com/2022/12/crsp-crispr-therapeutics-ag-common.html
    Explore at:
    Dataset updated
    Dec 27, 2022
    Dataset authored and provided by
    KappaSignal
    License

    https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html

    Description

    This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

    CRSP CRISPR Therapeutics AG Common Shares

    Financial data:

    • Historical daily stock prices (open, high, low, close, volume)

    • Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

    • Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

    Machine learning features:

    • Feature engineering based on financial data and technical indicators

    • Sentiment analysis data from social media and news articles

    • Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

    Potential Applications:

    • Stock price prediction

    • Portfolio optimization

    • Algorithmic trading

    • Market sentiment analysis

    • Risk management

    Use Cases:

    • Researchers investigating the effectiveness of machine learning in stock market prediction

    • Analysts developing quantitative trading Buy/Sell strategies

    • Individuals interested in building their own stock market prediction models

    • Students learning about machine learning and financial applications

    Additional Notes:

    • The dataset may include different levels of granularity (e.g., daily, hourly)

    • Data cleaning and preprocessing are essential before model training

    • Regular updates are recommended to maintain the accuracy and relevance of the data

  5. H

    Bid-Ask Spread Estimates for U.S. Stocks in CRSP

    • dataverse.harvard.edu
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Ardia; Emanuele Guidotti; Tim Alexander Kroencke (2024). Bid-Ask Spread Estimates for U.S. Stocks in CRSP [Dataset]. http://doi.org/10.7910/DVN/YAY4H6
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    David Ardia; Emanuele Guidotti; Tim Alexander Kroencke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contains monthly estimates of the effective bid-ask spread for each stock in the CRSP U.S. Stock database. Additional code and data are available at https://github.com/eguidotti/bidask

  6. U

    Center for Research in Security Prices (CRSP): Auxiliary Information for...

    • dataverse-staging.rdmc.unc.edu
    • dataverse.unc.edu
    pdf +1
    Updated Jun 17, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CRSP; CRSP (2013). Center for Research in Security Prices (CRSP): Auxiliary Information for NASDAQ NMS Securities [Dataset]. https://dataverse-staging.rdmc.unc.edu/dataset.xhtml?persistentId=hdl:1902.29/D-17561
    Explore at:
    text/plain; charset=us-ascii(45441), text/plain; charset=us-ascii(10449), pdf(1325642), text/plain; charset=us-ascii(7857)Available download formats
    Dataset updated
    Jun 17, 2013
    Dataset provided by
    UNC Dataverse
    Authors
    CRSP; CRSP
    License

    https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=hdl:1902.29/D-17561https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=hdl:1902.29/D-17561

    Area covered
    United States
    Description

    This file contains data supplementary to the daily CRSP NASDAQ Master file.

  7. F

    Financial Database Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Financial Database Report [Dataset]. https://www.marketreportanalytics.com/reports/financial-database-75303
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Apr 10, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global financial database market is experiencing robust growth, driven by increasing demand for real-time data analytics and insights across various financial sectors. The market, currently estimated at $15 billion in 2025, is projected to expand at a compound annual growth rate (CAGR) of 8% from 2025 to 2033, reaching approximately $28 billion by 2033. This expansion is fueled by several key factors. The rise of algorithmic trading and quantitative finance necessitates access to high-quality, comprehensive financial data, driving demand for both real-time and historical databases. Moreover, regulatory compliance requirements are pushing financial institutions to invest in robust data management systems, contributing to market growth. The increasing adoption of cloud-based solutions and advanced analytical tools further accelerates market expansion. The market is segmented by application (personal and commercial use) and database type (real-time and historical). The commercial segment currently dominates, propelled by the needs of large financial institutions, investment banks, and asset management firms. However, the personal use segment is expected to witness significant growth driven by the increasing accessibility of financial data and analytical tools to individual investors. Geographical distribution shows a strong presence in North America and Europe, which are expected to remain dominant markets due to the established financial infrastructure and advanced technological capabilities. However, Asia-Pacific is anticipated to demonstrate the fastest growth, driven by increasing economic activity and the expansion of financial markets in emerging economies. Competition is intense, with established players like Bloomberg and Refinitiv (Thomson Reuters) alongside emerging niche players. The competitive landscape is marked by both established giants and agile newcomers. Established players, like Bloomberg, Thomson Reuters, and WRDS, leverage their extensive data networks and brand reputation. However, these are challenged by newer entrants offering innovative solutions and specialized datasets targeting specific niche markets. The ongoing technological advancements, such as the rise of big data analytics and artificial intelligence, presents both opportunities and challenges. While AI-powered analytics unlock deeper insights from financial data, the need to adapt to evolving technologies and data security concerns require substantial investment. Regulatory changes and data privacy concerns also represent potential restraints, requiring continuous adaptation and compliance measures. The future of the market hinges on the ability of players to innovate, adapt to evolving regulations, and meet the increasing demand for speed, accuracy, and comprehensive financial data insights. The market's trajectory strongly suggests a promising future for both established and emerging companies.

  8. d

    Replication Data for: The Burden of the National Debt: Evidence from Mergers...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wu, Yanhui (2023). Replication Data for: The Burden of the National Debt: Evidence from Mergers and Acquisitions [Dataset]. http://doi.org/10.7910/DVN/A7Y2CD
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Wu, Yanhui
    Description

    //Replication Code for Dissanayake, Wu and Zhang, "The Burden of the National Debt: Evidence from Mergers and Acquisitions" //The main data come from the Compustat, CRSP, and Thomson Reuters databases, all of which are accessible via a subscription. //Pseudo-dataset "madebt_pseudo.dta" demonstrates format of the original dataset for all firm-years, with variable names. //The file "ivfirststage_pseudo.dta" contains Pseudo data for all the variables used in the first stage of the IV model as in Equation (3), except that macro variables with publicly available data contain real values.

  9. t

    BIOGRID CURATED DATA FOR PUBLICATION: Structure, function, and...

    • thebiogrid.org
    zip
    Updated Feb 8, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2002). BIOGRID CURATED DATA FOR PUBLICATION: Structure, function, and activator-induced conformations of the CRSP coactivator. [Dataset]. https://thebiogrid.org/124774/publication/structure-function-and-activator-induced-conformations-of-the-crsp-coactivator.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 8, 2002
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for Taatjes DJ (2002):Structure, function, and activator-induced conformations of the CRSP coactivator. curated by BioGRID (https://thebiogrid.org); ABSTRACT: The human cofactor complexes ARC (activator-recruited cofactor) and CRSP (cofactor required for Sp1 activation) mediate activator-dependent transcription in vitro. Although these complexes share several common subunits, their structural and functional relationships remain unknown. Here, we report that affinity-purified ARC consists of two distinct multisubunit complexes: a larger complex, denoted ARC-L, and a smaller coactivator, CRSP. Reconstituted in vitro transcription with biochemically separated ARC-L and CRSP reveals differential cofactor functions. The ARC-L complex is transcriptionally inactive, whereas the CRSP complex is highly active. Structural determination by electron microscopy (EM) and three-dimensional reconstruction indicate substantial differences in size and shape between ARC-L and CRSP. Moreover, EM analysis of independently derived CRSP complexes reveals distinct conformations induced by different activators. These results suggest that CRSP may potentiate transcription via specific activator-induced conformational changes.

  10. d

    Semi-coskewnesses and the cross-section of excepted stock returns: Evidence...

    • dataone.org
    • datadryad.org
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weiyi Liu; Huilin Zhou; Ronghua Luo; Xuan Liang (2023). Semi-coskewnesses and the cross-section of excepted stock returns: Evidence from China [Dataset]. http://doi.org/10.5061/dryad.80gb5mkx7
    Explore at:
    Dataset updated
    Dec 14, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Weiyi Liu; Huilin Zhou; Ronghua Luo; Xuan Liang
    Time period covered
    Jan 1, 2023
    Description

    We propose an alternative nonlinear semi-risk measure, by decomposing the traditional coskewness into four components associated with the signed excess market and asset returns, that captures the asymmetries in nonlinear markets. We find that the two semi-coskewnesses attributable to (positive) negative excess market returns predict significantly (lower) higher future returns based on high-frequency data from China’s A-share market. After conducting a wide range of implementations, the risk premium for negative coskewness stands out as the most significant, followed by the premium for mixed negative coskewness. In contrast, the results for positive and mixed positive coskewnesses are not always significantly negative. More importantly from an economically meaningful perspective, for a downside risk premium of 25.40% per annum, a 2-standard-deviation increase in negative semi-coskewness gives rise to an increase of approximately 13.71% in annual expected return., We calculate daily realized semi-coskewnesses using 5-minute intraday returns and aggregate them to obtain weekly frequency based on all the listed stocks of China’s A-share stock market. We also extract market capitalization, turnover rate, and book-to-market ratio for each stock from the CRSP database and the RESSET Financial Research database respectively. Lastly, we use daily returns to compute weekly returns, realized (co)moments, realized semi-risk factors, and lagged returns as well as maximum/minimum returns over the previous month., , # Semi-coskewnesses and the cross-section of excepted stock returns: Evidence from China

    Our empirical data consists of high-frequency intraday data on the opening and closing prices of all stocks (shares issued to domestic investors) from the China A-share stock market, along with weekly data on equity-related indicators.

    Description of the data and file structure

    The sample period selected in this paper spans from April 8, 2005 to April 22, 2022, with daily trading hours from 9:30 to 11:30 and 13:00 to 15:00 in GMT+8. Since too high data frequency results to larger market microscopic noise, and too low data frequency makes it difficult to capture high-frequency signals, Liu, Patton and Sheppard (2015) provide evidence that the estimations of realized measures based on a 5-minute data frequency perform well in terms of prediction accuracy, and a number of studies (Patton and Sheppard, 2015; Amaya, Christoffersen, Jacobs, and Vasquez, 2015; Bollerslev, 2020) adopt this frequ...

  11. S

    Data for common institutional ownership and innovation

    • scidb.cn
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinyu Liu; Wenjun Cai (2025). Data for common institutional ownership and innovation [Dataset]. http://doi.org/10.57760/sciencedb.28803
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Xinyu Liu; Wenjun Cai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our sample consists of annual data from firms listed on the A-share markets of the Shanghai and Shenzhen Stock Exchanges in China, covering the period from 2003 to 2022. We gather the necessary data on listed firm from two databases: Chinese Innovation Research Database (CIRD) for firms’ innovation, China Stock Market & Accounting Research Database (CSMAR) for common ownership. CIRD not only includes patent data filed or granted to different entities, distinguishing between three types of patents—invention, utility model, and design—but also provides key information such as the nature of applications (independent or joint), classification numbers, and patent statistics. CSMAR database is positioned as a research-oriented precision database, referring to the standards of authoritative databases such as CRSP and COMPUSTAT, with the aim of researching and quantifying investment analysis. We match the innovation data to the financial data for each firm, and we exclude financial listed companies, exclude ST and * ST listed companies and delete samples with missing data. To avoid extreme value interference, we winsorize all continuous variables at the 1% level. With these filters, our final sample of 48,956 firm-year observations for 4957 firms.

  12. m

    CRSP-2

    • rgd.mcw.edu
    Updated Feb 24, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rat Genome Database (2017). CRSP-2 [Dataset]. https://rgd.mcw.edu/rgdweb/report/gene/main.html?id=12411553
    Explore at:
    Dataset updated
    Feb 24, 2017
    Dataset authored and provided by
    Rat Genome Database
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ENCODES a protein that exhibits hormone activity (inferred); INVOLVED IN signal transduction (inferred); FOUND IN extracellular region (inferred)

  13. m

    CRSP-3

    • rgd.mcw.edu
    Updated Feb 4, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rat Genome Database (2017). CRSP-3 [Dataset]. https://rgd.mcw.edu/rgdweb/report/gene/main.html?id=12137338
    Explore at:
    Dataset updated
    Feb 4, 2017
    Dataset authored and provided by
    Rat Genome Database
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ENCODES a protein that exhibits identical protein binding (ortholog); protein-containing complex binding (ortholog); signaling receptor binding (ortholog); INVOLVED IN activation of adenylate cyclase activity (ortholog); activation of protein kinase activity (ortholog); adenylate cyclase-activating G protein-coupled receptor signaling pathway (ortholog); ASSOCIATED WITH acute necrotizing pancreatitis (ortholog); amyotrophic lateral sclerosis (ortholog); Animal Disease Models (ortholog); FOUND IN axon (ortholog); cytoplasm (ortholog); extracellular space (ortholog)

  14. m

    CRSP-4

    • rgd.mcw.edu
    Updated Feb 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rat Genome Database (2017). CRSP-4 [Dataset]. https://rgd.mcw.edu/rgdweb/report/gene/main.html?id=12394336
    Explore at:
    Dataset updated
    Feb 5, 2017
    Dataset authored and provided by
    Rat Genome Database
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ENCODES a protein that exhibits hormone activity (inferred); INVOLVED IN signal transduction (inferred); FOUND IN extracellular region (inferred)

  15. t

    ETFs - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ETFs - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/etfs
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used in this paper is the ETFs used in the CRSP database.

  16. f

    Yorktown Management & Research Co Inc reported holding of CRSP

    • filingexplorer.com
    Updated Mar 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yorktown Management & Research Co Inc (2021). Yorktown Management & Research Co Inc reported holding of CRSP [Dataset]. https://www.filingexplorer.com/form13f-holding/H17182108?cik=0001313559&period_of_report=2021-03-31
    Explore at:
    Dataset updated
    Mar 31, 2021
    Dataset authored and provided by
    Yorktown Management & Research Co Inc
    Description

    Historical ownership data of CRSP by Yorktown Management & Research Co Inc

  17. H

    Common Ownership Data: Scraped SEC form 13F filings for 1999-2017

    • dataverse.harvard.edu
    Updated Aug 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Backus; Christopher T Conlon; Michael Sinkinson (2020). Common Ownership Data: Scraped SEC form 13F filings for 1999-2017 [Dataset]. http://doi.org/10.7910/DVN/ZRH3EU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 17, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Matthew Backus; Christopher T Conlon; Michael Sinkinson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1999 - Dec 31, 2017
    Description

    Introduction In the course of researching the common ownership hypothesis, we found a number of issues with the Thomson Reuters (TR) "S34" dataset used by many researchers and frequently accessed via Wharton Research Data Services (WRDS). WRDS has done extensive work to improve the database, working with other researchers that have uncovered problems, specifically fixing a lack of records of BlackRock holdings. However, even with the updated dataset posted in the summer of 2018, we discovered a number of discrepancies when accessing data for constituent firms of the S&P 500 Index. We therefore set out to separately create a dataset of 13(f) holdings from the source documents, which are all public and available electronically from the Securities and Exchange Commission (SEC) website. Coverage is good starting in 1999, when electronic filing became mandatory. However, the SEC's Inspector General issued a critical report in 2010 about the information contained in 13(f) filings. The process: We gathered all 13(f) filings from 1999-2017 here. The corpus is over 318,000 filings and occupies ~25GB of space if unzipped. (We do not include the raw filings here as they can be downloaded from EDGAR). We wrote code to parse the filings to extract holding information using regular expressions in Perl. Our target list of holdings was all public firms with a market capitalization of at least $10M. From the header of the file, we first extract the filing date, reporting date, and reporting entity (Central Index Key, or CIK, and CIKNAME). Beginning with the September 30 2013 filing date, all filings were in XML format, which made parsing fairly straightforward, as all values are contained in tags. Prior to that date, the filings are remarkable for the heterogeneity in formatting. Several examples are linked to below. Our approach was to look for any lines containing a CUSIP code that we were interested in, and then attempting to determine the "number of shares" field and the "value" field. To help validate the values we extracted, we downloaded stock price data from CRSP for the filing date, as that allows for a logic check of (price * shares) = value. We do not claim that this will exhaustively extract all holding information. We can provide examples of filings that are formatted in such a way that we are not able to extract the relevant information. In both XML and non-XML filings, we attempt to remove any derivative holdings by looking for phrases such as OPT, CALL, PUT, WARR, etc. We then perform some final data cleaning: in the case of amended filings, we keep an amended level of holdings if the amended report a) occurred within 90 days of the reporting date and b) the initial filing fails our logic check described above. The resulting dataset has around 48M reported holdings (CIK-CUSIP) for all 76 quarters and between 4,000 and 7,000 CUSIPs and between 1,000 and 4,000 investors per quarter. We do not claim that our dataset is perfect; there are undoubtedly errors. As documented elsewhere, there are often errors in the actual source documents as well. However, our method seemed to produce more reliable data in several cases than the TR dataset, as shown in Online Appendix B of the related paper linked above. Included Files Perl Parsing Code (find_holdings_snp.pl). For reference, only needed if you wish to re-parse original filings. Investor holdings for 1999-2017: lightly cleaned. Each CIK-CUSIP-rdate is unique. Over 47M records. The fields are CIK: the central index key assigned by the SEC for this investor. Mapping to names is available below. CUSIP: the identity of the holdings. Consult the SEC's 13(f) listings to identify your CUSIPs of interest. shares: the number of shares reportedly held. Merging in CRSP data on shares outstanding at the CUSIP-Month level allows one to construct \beta. We make no distinction for the sole/shared/none voting discretion fields. If a researcher is interested, we did collect that starting in mid-2013, when filings are in XML format. rdate: reporting date (end of quarter). 8 digit, YYYYMMDD. fdate: filing date. 8 digit, YYYYMMDD. ftype: the form name. Notes: we did not consolidate separate BlackRock entities (or any other possibly related entities). If one wants to do so, use the CIK-CIKname mapping file below. We drop any CUSIP-rdate observation where any investor in that CUSIP reports owning greater than 50% of shares outstanding (even though legitimate cases exist - see, for example, Diamond Offshore and Loews Corporation). We also drop any CUSIP-rdate observation where greater than 120% of shares outstanding are reported to be held by 13(f) investors. Cases where the shares held are listed as zero likely mean the investor filing lists a holding for the firm but that our code could not find the number of shares due to the formatting of the file. We leave these in the data so that any researchers that find a zero know to go back to that source filing to manually gather the...

  18. f

    Data from: Silk genes and silk gene expression in the spider Tengella...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Sep 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clarke III, Thomas H.; Correa-Garhwal, Sandra M.; Hayashi, Cheryl Y.; Chan, Fanny S.; Alaniz, Liliana G.; Chaw, R. Crystal; Alfaro, Rachael E. (2018). Silk genes and silk gene expression in the spider Tengella perfuga (Zoropsidae), including a potential cribellar spidroin (CrSp) [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000690394
    Explore at:
    Dataset updated
    Sep 20, 2018
    Authors
    Clarke III, Thomas H.; Correa-Garhwal, Sandra M.; Hayashi, Cheryl Y.; Chan, Fanny S.; Alaniz, Liliana G.; Chaw, R. Crystal; Alfaro, Rachael E.
    Description

    Most spiders spin multiple types of silk, including silks for reproduction, prey capture, and draglines. Spiders are a megadiverse group and the majority of spider silks remain uncharacterized. For example, nothing is known about the silk molecules of Tengella perfuga, a spider that spins sheet webs lined with cribellar silk. Cribellar silk is a type of adhesive capture thread composed of numerous fibrils that originate from a specialized plate-like spinning organ called the cribellum. The predominant components of spider silks are spidroins, members of a protein family synthesized in silk glands. Here, we use silk gland RNA-Seq and cDNA libraries to infer T. perfuga silks at the protein level. We show that T. perfuga spiders express 13 silk transcripts representing at least five categories of spider silk proteins (spidroins). One category is a candidate for cribellar silk and is thus named cribellar spidroin (CrSp). Studies of ontogenetic changes in web construction and spigot morphology in T. perfuga have documented that after sexual maturation, T. perfuga females continue to make capture webs but males halt web maintenance and cease spinning cribellar silk. Consistent with these observations, our candidate CrSp was expressed only in females. The other four spidroin categories correspond to paralogs of aciniform, ampullate, pyriform, and tubuliform spidroins. These spidroins are associated with egg sac and web construction. Except for the tubuliform spidroin, the spidroins from T. perfuga contain novel combinations of amino acid sequence motifs that have not been observed before in these spidroin types. Characterization of T. perfuga silk genes, particularly CrSp, expand the diversity of the spidroin family and inspire new structure/function hypotheses.

  19. H

    Global Experiment Data Reports for Pond Dynamics and Aquaculture

    • dataverse.harvard.edu
    Updated Aug 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hillary Egna (2020). Global Experiment Data Reports for Pond Dynamics and Aquaculture [Dataset]. http://doi.org/10.7910/DVN/GIXODR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Hillary Egna
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1982 - 1987
    Description

    Collaborative Research Data Report Series Through 1996, the Pond Dynamics/ Aquaculture Collaborative Research Support Program (PD/A CRSP) conducted a standardized global experiment at sites in seven countries: Egypt, Rwanda, Honduras, Panama, Thailand, the Philippines, and Indonesia. Data were uploaded into a Central Data Base from these seven countries (see Global Experiment dataset in this dataverse). Data Reports covers the first three cycles from the original participating sites from 1982-1987: Indonesia, Philippines, Rwanda, Honduras, Gualaca Panama, and Aguadulce Panama. Data Reports reported on verified data from the PD/ A CRSP Central Data Base and presented interpretations of site specific results. Research results from experiments conducted after the first three cycles were not included in Data Reports. They were later published in various aquaculture journals or in the program's own PD/ A CRSP Research Report series: https://aquafishcrsp.oregonstate.edu/aquafish-nop The first of the Data Reports, General Reference: Volume I (first and second editions), provides descriptive information for each of the PD/A CRSP sites and serves as a guide for the entire Data Report series. Volume I presents the physical characteristics of each site, including a geographical sketch, climatology, and water and soil analyses. The second edition of Volume I provides additional information about the CRSP research sites, including several sites added to the program since the first edition of Volume I. Subsequent volumes of Data Reports focus on each of the original sites in the first three cycles separately. Each volume includes one cycle (wet and dry seasons) of the PD/A CRSP Global Experiment. Therefore, with few exceptions, each original project site has three volumes devoted to it, representing the results of the three cycles of the global experiment. The experimental cycles are described in PD/A CRSP Work Plans which are available as links from this dataset. EXCERPTED DESCRIPTION OF EXPERIMENTS (for full descriptions, please refer to the related links) During the planning of the PD/A CRSP, researchers recognized the need to improve the existing data available on pond culture systems. The technical literature about pond aquaculture contained general operating guidelines; however, the lack of standardization in experimental design, data collection, and analysis precluded statistical comparison between studies or sites, and were of limited utility in predicting the performance of pond culture systems. The PD/A CRSP developed a standardized data base that could be used to evaluate pond performance over a broad range of environments. Experimental Design The statistical design for the first three cycles of the global experiments involves monitoring environmental and fish production variables at seven geographical locations. The different locations provided a spectrum of pond environments, in different climatic zones and geographies within 20 degrees of the equator. Observations specified in the annual work plans (experimental cycles) were made on twelve or more ponds at each location, except at Gualaca Panama where ten ponds were used. The pond variables observed, frequency of observation, materials and methods for determination, and standardized reporting units are presented in the Work Plans, and other reports available on the PD/A CRSP website. Observations at each location were recorded by the research team involved at that location, and data were filed and managed in a centralized CRSP Data Base at Oregon State University until the early 1990s, and then at both OSU and the Asian Institute of Technology, Thailand. CRSP Work Plans The PD/A CRSP technical work plans were developed by a research team composed of U.S. and host country Principal Investigators, and then reviewed by the PD/A CRSP Technical Committee. Each work plan presents detailed experimental protocols for one experimental cycle. A cycle involves two series of observations of four to five months duration. One set of observations was made during the dry season and the other during the wet season. Three work plans constitute the original series of the global experiment, and are tabulated in hard copy as Data Reports. The rationale included managing all ponds in exactly the same way to establish a detailed baseline of pond variables. Then in subsequent experiments, the pond environments were manipulated in different ways and the responses observed. First Cycle of the CRSP Global Experiment The first work plan was developed in 1983. This work plan specified standard methods for pond preparation and monitoring. All ponds were prepared in the same way, fish were stocked at the same levels, and specified variables were observed during both the wet and dry seasons. The sites of the research projects can be categorized as: brackish to marine tropical locations in Panama (Aquadulce) and the Philippines; warm, tropical, freshwater locations...

  20. H

    International Aquaculture Curated Database

    • dataverse.harvard.edu
    Updated Jan 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hillary Egna (2018). International Aquaculture Curated Database [Dataset]. http://doi.org/10.7910/DVN/JFG7F5
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 2, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Hillary Egna
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The International Aquaculture Curated Database (IACD), created by the AquaFish Innovation Lab, consists of 542 articles, written by 1706 authors in 121 journals, all of which were published between 1983 and 2016. The IACD draws from peer-reviewed papers whose research was supported by four separate international aquaculture programs, which were developed by Hillary Egna including: 1. Pond Dynamics/Aquaculture CRSP (1982-1996); 2. Aquaculture CRSP (1996-2008); 3. AquaFish CRSP (2006-2013); and 4. AquaFish Innovation Lab (2013-Present). The IACD was compiled by two AquaFIsh Innovation Lab faculty and a student who reviewed both electronic and hard copies of journal articles. Every publication since 1983 was recorded with relevant publication information, including full names, gender of authors, and author position, with the percentage of unknowns being less than 1%. Gender of authors was recorded by Egna from having a personal connection to the author or by the lead authors themselves. For privacy reasons, some of the publication details were removed.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Center for Research in Security Prices (2024). Center for Research in Security Prices (CRSP) Stock Files [Dataset]. https://archive.ciser.cornell.edu/studies/2191/project-description
Organization logo

Center for Research in Security Prices (CRSP) Stock Files

Explore at:
15 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Aug 7, 2024
Dataset authored and provided by
Center for Research in Security Prices
Description

The Center for Research in Security Prices (CRSP) stock databases provide time-series and event data on individual stocks, augmented with market time-series. Daily and monthly time-series variables include returns, closing, low bid and high ask prices, and trading volume. Event data includes distributions, shares outstanding, names, etc.

Dataset is an external database available here for Cornell affiliates: https://johnson.library.cornell.edu/database/wharton-research-data-services-wrds/

Search
Clear search
Close search
Google apps
Main menu