Facebook
TwitterI was thinking about whether I can make a startup prediction model to basically predict the success and failure of the startup in order to save lots of energy and resources.
This database is open-sourced by Crunchbase.
I am sure that startup enthusiasts will love this dataset and try to build working models.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
CrunchBase in an online platform providing information about startups and technology companies, including related entities such as the products they sell, key people they employ, and investments they made and received.
We provide here an RDF data set of Crunchbase as of October 2015. The data set contains information about
1,946,435 jobs
1,348,449 websites
567,937 organizations
519,763 news
430,093 people
60,076 products, and
33,127 acquisitions.
The data set has been used, among other things, for data integration with financial data sources to evaluate the performance of particular companies and for monitoring news to find statements that are not in Crunchbase as an RDF knowledge graph yet.
Note that the provided data set was created in October 2015 when all Crunchbase data was licensed under Creative Commons Attribution-NonCommercial License 4.0 (CC-BY-NC) and partly under Creative Commons Attribution License 4.0 (CC-BY). Also the provied data set is licensed under these licenses. Concerning licensing of current Crunchbase data, we can refer to https://about.crunchbase.com/terms-of-service/.
For more information about the data set, see our paper A Linked Data Wrapper for CrunchBase.
When you use the data set, please cite us as follows:
Michael Färber, Carsten Menne, Andreas Harth. “A Linked Data Wrapper for CrunchBase”. In: Semantic Web Journal 9(4). IOS Press, 2018, pp. 505–5015. (BibTeX entry at DBLP)
Facebook
TwitterGain a competitive edge with Crunchbase Dataset: Worldwide firmographic and statups insights, ready for action.
Facebook
TwitterThe CrunchBase Open Data Map is our answer for anyone that wants to reference CrunchBase or include basic CrunchBase profile information within their application. The data map includes the core CrunchBase information for People and Organizations in the CrunchBase Dataset.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Crunchbase company Data includes a variety of startup-related data, including startup enterprise information, year of establishment, investment history, and industrial sector released by Crunchbase.
2) Data Utilization (1) Crunchbase company Data has characteristics that: • This dataset provides a variety of attributes that can impact the growth and success of a start-up, including company name, year of establishment, location, industry, investment stage, investment amount, and investors. (2) Crunchbase company Data can be used to: • Development of Startup Success Prediction Model: By utilizing key attributes such as the year of establishment, investment history, and industrial sector, we can build a model that predicts a startup's chances of success. • Venture investment and market analysis: It can be used to identify venture investment strategies and market trends by analyzing startup distribution and growth patterns by investment stage and industry.
Facebook
TwitterStartups are crucial to the expansion of the economy. They move the economy by bringing fresh perspectives, encouraging innovation, and generating jobs. Every day, dozens of new businesses are launched, and venture capital has grown to represent a sizable asset class, with annual investments topping $100 billion in the US alone. Predicting a startup's growth enables investors to identify businesses with the potential for rapid growth, giving them an advantage over the competition. Using the Crunchbase 2013 dataset, we can peek into this exciting world.
Dataset contains a startup's financial information and is labeled with the company's status (IPO, Operating, Acquired, Closed). Dataset is extremely biased: | IPO | Closed | Acquired | Operating | | --- | --- | --- | --- | | 1.9% | 3.1% | 9.4% | 85.6% |
I would like to thank Yasin Shah, for providing us with this dataset. He is Yasin Shah a Senior Software Engineer at Google.
Facebook
TwitterTraffic analytics, rankings, and competitive metrics for crunchbase.com as of October 2025
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This DataSet to track the latest trends, we’ve compiled small business and startup statistics to better understand what makes a startup tick. If you’re looking to build a startup or just interested in diving into the numbers, check out these informative statistics on success, failure, funding and more before getting started.
Objective The objective of the project is to predict whether a startup which is currently operating turn into a success or a failure. The success of a company is defined as the event that gives the company's founders a large sum of money through the process of M&A (Merger and Acquisition) or an IPO (Initial Public Offering). A company would be considered as failed if it had to be shutdown.
This problem will be solved through a Supervised Machine Learning approach by training a model based on the history of startups which were either acquired or closed. The trained model will then be used to make predictions on startups which are currently operating to determine their success/failure.
Do an EDA and try to predict which startups and in which field achieve great success!
https://www.googleapis.com/download/storage/v1/b/kaggle-forum-message-attachments/o/inbox%2F9770082%2Fd1cc4e53157d2f3f0a9f661b6f2cd28f%2FGroup%202215.jpg?generation=1674420531095211&alt=media" alt="">
You will have to answer the following questions: - How Many New Businesses Fail ? - How Many New Businesses Secsees ? - Reasons for Failing - How to Avoid Failing And many other questions...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analyze startup success rates across 12 US cities using Crunchbase data. Does location impact Series A & B fundraising? Key insights for founders & VCs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Discover which startup sectors have the best odds of Series A success. Data shows 40% of search startups secure funding vs just 10% for hardware. Key trends revealed.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The data contains information on almost 700 startup companies backed by Y Combinator between 2005 and 2014. The data is collected and aggregated from SeedDB, CrunchBase and AngelList. The it has been cleaned and made consistent.
The variables included are:
Startups.csv
Company
Satus
Year Founded
Mapping Location
Description
Categories
Founders
Y Combinator Year
Y Combinator Session
Investors
Amounts raised in different funding rounds
Office Address
Headquarters (City)
Headquarters (US State)
Headquarters (Country)
Logo
Seed-DB Profile
Crunchbase / Angel List Profile Website
Founders.csv
Founder
Company
Gender
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Discover the optimal months for startup fundraising based on 15+ years of VC investment data. Learn when VCs are most active and how to time your raise.
Facebook
TwitterColumns:
• Organization Name
• Total Funding Amount
• Total Funding Amount USD
• Industries
• Headquarter Location
• Headquarter Region
• Operating Status
• Investment Stage
• Number of Investment
• Number of Exits
• Founded Date
• Industry Groups
• Number of Founders
• Number of Employees
• Number of Founding Rounds
• Funding Status
• Last Funding Date
• Last Funding Type
• Top 5 Investors
• Number of Investors
• Number of Acquisitions
• Acquisiton Status
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Export of EDI_SubGrantees_DB at the end of the second reporting period (M27). It includes the data available at M27, since the 2nd round of incubation actually finishes in M28. Only non-privacy sensitive fields from EDI_SubGrantees_DB will be dumped in this export. This dataset corresponds to the following statement encountered it the Grant Agreement: “EDI will publish an Open Dataset with the beneficiaries of our three open calls including their data and funding received. This dataset will be made available at FIWARE Lab. This has been proven as a beneficial measure for transparency but also discoverability of the companies and their investment received in EDI. At the same time, we will populate the wiki-sourced database of Crunchbase to facilitate access to the data of our start-ups and the incubator as such in one of the reference portals in the start-up ecosystem”.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Export of EDI_SubGrantees_DB at the end of the first reporting period (M15), i.e. March 2019. It includes the data available at M15, since the 1st round of incubation actually finishes in M16. Only non-privacy sensitive fields from EDI_SubGrantees_DB will be dumped in this export. This dataset corresponds to the following statement encountered it the Grant Agreement: “EDI will publish an Open Dataset with the beneficiaries of our three open calls including their data and funding received. This dataset will be made available at FIWARE Lab. This has been proven as a beneficial measure for transparency but also discoverability of the companies and their investment received in EDI. At the same time, we will populate the wiki-sourced database of Crunchbase to facilitate access to the data of our start-ups and the incubator as such in one of the reference portals in the start-up ecosystem”.
Facebook
TwitterThese data were generated as part of a two-and-a-half-year ESRC-funded research project examining the digitalisation of higher education (HE) and the educational technology (Edtech) industry in HE. Building on a theoretical lens of assetisation, it focused on forms of value in the sector, and governance challenges of digital data. It followed three groups of actors: UK universities, Edtech companies, and investors in Edtech. The researchers first sought to develop an overview of the Edtech industry in HE by building three databases on Edtech companies, investors in Edtech, and investment deals, using data downloaded from Crunchbase, a proprietary platform. Due to Crunchbase’s Terms of Service, only parts of one database are allowed to be submitted to this repository, i.e. a list of companies with the project’s classification. A report offering descriptive analysis of all three databases was produced and is submitted as well. A qualitative discursive analysis was conducted by analysing seven documents in depth. In the second phase, researchers conducted interviews with participants representing three groups of actors (n=43) and collected documents on their organisations. Moreover, a list of documents collected from Big Tech (Microsoft, Amazon, and Salesforce) were collected to contextualise the role of global digital infrastructure in HE. Due to commercial sensitivity, only lists of documents collected about investors and Big Tech are submitted to the repository. Researchers then conducted focus groups (n=6) with representatives of universities (n=19). The dataset includes transcripts of focus groups and outputs of writing by participants during the focus group. Finally, a public consultation was held via a survey, and 15 participants offered qualitative answers.
The higher education (HE) sector has been marketised for decades; but the speed, scope, and extent of marketisation has led key education scholars to conceptualise it as a global industry (Verger, Lubienski, & Steiner-Khamsi, 2016). Further, the use of technology to transform teaching and learning, as well as the profound digitalisation of universities more broadly, has led universities to collect and process an unprecedented amount of digital data. Education technology (EdTech) companies have become one of the key players in the HE industry and the UK has made EdTech one of its key pillars in its recent international education strategy (HM Government, 2019). EdTech companies are reporting unprecedented growth. In 2019, Coursera became a 'unicorn' (i.e. a company worth over $1 billion), while British-based FutureLearn secured £50 million investment by selling 50% shares of the company. Investment in EdTech is growing at an impressive rate and reached $16.3bn in 2018 (ET, 2019). While EdTech start-up companies strive to become 'unicorns' and profit from HE, so too might universities increasingly look for new ways of profiting from the wealth of digital data they produce.
The study of HE markets has so far focused on service-commodities. However, data and data products do not act like commodities. Commodities are consumed once used, but data is reproducible at almost zero marginal cost. New products and services can be created from data and monetised through subscription fees, an app, or a platform that does not transfer ownership, control, or reproduction rights to the user. Furthermore, data use creates yet more data, and the network effects increase the value of these platforms. Therefore, there is a new quality at play in the monetisation and marketisation of these digital HE products and services: 'assetization'. We are witnessing a widespread change from creating value via market exchange towards extracting value via the ownership and control of assets.
This research project aims to investigate these new processes of value creation and extraction in an HE sector that is digitalising its operations and introducing new digital solutions premised on the expansion of service fees. By introducing a focus on assets, and economic rents, this project offers a theoretically and empirically transformative approach to understand emerging HE markets and their implications for the HE sector. The assetization of HE is consequential because of the legal and technical implications for its regulation. It is also crucial to examine in any discussion about the legitimate and socially just arrangement and distribution of assets, their ownership, and their uses. The project employs an innovative, comparative, and participatory mixed-methods research design. It combines digital methods, interviews, observation, document analysis, deliberative focus groups, knowledge exchange and co-production with stakeholders, and public consultation. Data analysis will include quantitative and qualitative analysis of investment trends, comparative case studies of investors, EdTech companies and universities, and social network analysis.
The application of this research project is fourfold. First, it will help universities understand the emerging processes of assetization so they can develop policies and practices for protecting their rights. Second, it will assist entrepreneurs in finding ways to incorporate ethical and sustainable considerations in their innovation processes. Third, it will mediate between the financial interests of investors and the social function of universities. Here, it will provide evidence for policymakers on how to include assets in HE sector regulation. Finally, it will unpack potential forms of inequality that assetization might bring into the HE sector.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore how Second Seed rounds grew to 18% of US tech funding, with median rounds hitting $1M - key data on this emerging startup financing trend.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
El conjunto de datos extraído de Crunchbase aplicando la técnica del web scraping contiene 1.000 observaciones de organizaciones, una muestra muy pequeña del conjunto de datos potencial. El límite establecido se debe a razones de practicabilidad.
La información recogida de Crunchbase es información comercial, donde es posible consultar, entre otros campos,
las industrias en las que una organización opera «Industries»,
dónde se encuentra su sede «Headquarters Region»,
año de fundación «Founded Date»,
estado actual «Operating Status»,
email de contacto «Contact Email»,
teléfono de contacto «Phone Number»,
sus fundadores «Founders»,
si es un fondo en qué compañías y etapas invierte «Investment Stage»,
etc.
Los tipos de organizaciones que se pueden observar en los datos son:
Empresas
Inversores
Escuelas
El dataset se ha elaborado con el objetivo de ayudar a los fondos de capital riesgo a dedicar menos tiempo buscando información y más tiempo cerrando acuerdos.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Export of EDI_SubGrantees_DB at the end of the project (M42, i.e. July 2021). It includes all the data available of subgrantees. Only non-privacy sensitive fields from EDI_SubGrantees_DB are dumped in this export. This dataset corresponds to the following statement encountered it the Grant Agreement: “EDI will publish an Open Dataset with the beneficiaries of our three open calls including their data and funding received. This dataset will be made available at FIWARE Lab. This has been proven as a beneficial measure for transparency but also discoverability of the companies and their investment received in EDI. At the same time, we will populate the wiki-sourced database of Crunchbase to facilitate access to the data of our start-ups and the incubator as such in one of the reference portals in the start-up ecosystem”. Updated with latest funding data of subgrantees for round 1, round 2 and round 3 of EDI
Facebook
TwitterData Fusion of: Companies: RAW - Companies House Records; Raw - Relatorios de Contas (act Nov 2016); RAW - Crunchbase and Webpages and FINOVA FCR´s: Raw - Relatorios de Contas (act Nov 2016); RAW - Crunchbase and Webpages and FINOVA
Facebook
TwitterI was thinking about whether I can make a startup prediction model to basically predict the success and failure of the startup in order to save lots of energy and resources.
This database is open-sourced by Crunchbase.
I am sure that startup enthusiasts will love this dataset and try to build working models.