In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.詳細
The Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.
You can check the fields description in the documentation: current Keyword database: https://docs.dataforseo.com/v3/databases/google/keywords/?bash; Historical Keyword database: https://docs.dataforseo.com/v3/databases/google/history/keywords/?bash. You don’t have to download fresh data dumps in JSON or CSV – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.
You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.
Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.
Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.
Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.
This database is available in JSON format only.
You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.
The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents
For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/
“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.
Banner photo by Helloquence on Unsplash
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Analytical Data Store Tools market is experiencing robust growth, driven by the increasing need for real-time insights and advanced analytics across diverse industries. The market, estimated at $50 billion in 2025, is projected to maintain a healthy Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $150 billion by 2033. This expansion is fueled by several key factors: the proliferation of big data, the rising adoption of cloud-based solutions offering scalability and cost-effectiveness, and the growing demand for improved decision-making capabilities across organizations. Key trends include the increasing integration of AI and machine learning into analytical data store tools, the emergence of serverless architectures, and a focus on enhanced data security and governance. While the market faces challenges like data integration complexities and the need for skilled professionals, the overall outlook remains positive, driven by continued innovation and expanding enterprise adoption. The competitive landscape is highly dynamic, with major players like Google, Snowflake, Microsoft, Amazon, and Oracle leading the charge. These established players are constantly innovating and expanding their offerings to meet evolving customer needs, while smaller, specialized companies are emerging to cater to niche requirements. The market's segmentation reflects this diversity, with solutions catering to various data volumes, industry verticals, and deployment models (cloud, on-premise, hybrid). Geographical expansion, particularly in rapidly developing economies, presents a significant opportunity for growth. The historical period (2019-2024) likely saw a slower growth rate than the projected future growth, reflecting the time taken for market maturity and broader adoption of cloud technologies. The continued focus on data-driven decision-making across industries ensures the sustained growth trajectory of the Analytical Data Store Tools market.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
ChEMBL is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.
ChEMBL is a manually curated database of bioactive molecules with drug-like properties used in drug discovery, including information about existing patented drugs.
Schema: http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/chembl_23_schema.png
Documentation: http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/schema_documentation.html
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
“ChEMBL” by the European Bioinformatics Institute (EMBL-EBI), used under CC BY-SA 3.0. Modifications have been made to add normalized publication numbers.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:ebi_chembl
You can check the fields description in the documentation: regular SERP: https://docs.dataforseo.com/v3/databases/google/serp_regular/?bash; Advanced SERP: https://docs.dataforseo.com/v3/databases/google/serp_advanced/?bash; Historical SERP: https://docs.dataforseo.com/v3/databases/google/history/serp_advanced/?bash You don’t have to download fresh data dumps in JSON or CSV – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This database is the result of a combined effort of several European and US researchers to collect, clean and harmonise disclosed SEPs data at thirteen major standard setting organisations (including ETSI, ITU, IEEE, ISO, and more).
Disclosed Standard Essential Patents (dSEP) Data provides a full overview of disclosed intellectual property rights at setting organizations worldwide.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:dsep
“Disclosed Standard Essential Patents Database” by Bekkers, R., Catalini, C., Martinelli, A., & Simcoe, T. (2012). Intellectual Property Disclosure in Standards Development. Proceedings from NBER conference on Standards, Patents & Innovation, Tucson (AZ), January 20 and 21, 2012.
Banner photo by Helloquence on Unsplash
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The Non-Relational SQL market size is projected to grow from USD 4.7 billion in 2023 to USD 15.8 billion by 2032, at a compound annual growth rate (CAGR) of 14.5% during the forecast period. This significant growth can be attributed to the rising demand for scalable and flexible database management solutions that efficiently handle large volumes of unstructured data.
One of the primary growth factors driving the Non-Relational SQL market is the exponential increase in data generation from various sources such as social media, IoT devices, and enterprise applications. As businesses seek to leverage this data for gaining insights and making informed decisions, the need for databases that can manage and process unstructured data efficiently has become paramount. Non-Relational SQL databases, such as document stores and graph databases, provide the required flexibility and scalability, making them an ideal choice for modern data-driven enterprises.
Another significant growth factor is the increasing adoption of cloud-based solutions. Cloud deployment offers numerous advantages, including reduced infrastructure costs, scalability, and easier management. These benefits have led to a surge in the adoption of Non-Relational SQL databases hosted on cloud platforms. Major cloud service providers like Amazon Web Services, Microsoft Azure, and Google Cloud offer robust Non-Relational SQL database services, further fueling market growth. Additionally, the integration of AI and machine learning with Non-Relational SQL databases is expected to enhance their capabilities, driving further adoption.
The rapid advancement in technology and the growing need for real-time data processing and analytics are also propelling the market's growth. Non-Relational SQL databases are designed to handle high-velocity data and provide quick query responses, making them suitable for real-time applications such as fraud detection, recommendation engines, and personalized marketing. As organizations increasingly rely on real-time data to enhance customer experiences and optimize operations, the demand for Non-Relational SQL databases is set to rise.
Regional outlook indicates that North America holds the largest share of the Non-Relational SQL market, driven by the presence of major technology companies and early adoption of advanced database technologies. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, fueled by the rapid digital transformation initiatives and increasing investments in cloud infrastructure. Europe and Latin America also present significant growth opportunities due to the rising adoption of big data and analytics solutions.
When analyzing the Non-Relational SQL market by database type, we observe that document stores hold a significant share of the market. Document stores, such as MongoDB and Couchbase, are particularly favored for their ability to store, retrieve, and manage document-oriented information. These databases are highly flexible, allowing for the storage of complex data structures and providing an intuitive query language. The increasing adoption of document stores can be ascribed to their ease of use and adaptability to various application requirements, making them a popular choice among developers and businesses.
Key-Value stores represent another crucial segment of the Non-Relational SQL market. These databases are known for their simplicity and high performance, making them ideal for caching, session management, and real-time data processing applications. Redis and Amazon DynamoDB are prominent examples of key-value stores that have gained widespread acceptance. The growing need for low-latency data access and the ability to handle massive volumes of data efficiently are key drivers for the adoption of key-value stores in various industries.
The market for column stores is also expanding as businesses require databases that can handle large-scale analytical queries efficiently. Columnar storage formats, such as Apache Cassandra and HBase, optimize read and write performance for analytical processing, making them suitable for big data analytics and business intelligence applications. The ability to perform complex queries on large datasets quickly is a significant advantage of column stores, driving their adoption in industries that rely heavily on data analytics.
Graph databases, such as Neo4j and Amazon Neptune, are gaining traction due to their ability to model
This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available here . For additional reporting and data visualizations, see The New York Times’ U.S. coronavirus interactive site . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. Users of The New York Times public-use data files must comply with data use restrictions to ensure that the information will be used solely for noncommercial purposes.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dimensions is the largest database of research insight in the world. It represents the most comprehensive collection of linked data related to the global research and innovation ecosystem available in a single platform. Because Dimensions maps the entire research lifecycle, you can follow academic and industry research from early stage funding, through to output and on to social and economic impact. Businesses, governments, universities, investors, funders and researchers around the world use Dimensions to inform their research strategy and make evidence-based decisions on the R&D and innovation landscape. With Dimensions on Google BigQuery, you can seamlessly combine Dimensions data with your own private and external datasets; integrate with Business Intelligence and data visualization tools; and analyze billions of data points in seconds to create the actionable insights your organization needs. Examples of usage: Competitive intelligence Horizon-scanning & emerging trends Innovation landscape mapping Academic & industry partnerships and collaboration networks Key Opinion Leader (KOL) identification Recruitment & talent Performance & benchmarking Tracking funding dollar flows and citation patterns Literature gap analysis Marketing and communication strategy Social and economic impact of research About the data: Dimensions is updated daily and constantly growing. It contains over 112m linked research publications, 1.3bn+ citations, 5.6m+ grants worth $1.7trillion+ in funding, 41m+ patents, 600k+ clinical trials, 100k+ organizations, 65m+ disambiguated researchers and more. The data is normalized, linked, and ready for analysis. Dimensions is available as a subscription offering. For more information, please visit www.dimensions.ai/bigquery and a member of our team will be in touch shortly. If you would like to try our data for free, please select "try sample" to see our openly available Covid-19 data.Learn more
Brand performance data collected from AI search platforms for the query "best database for large scale analytics".
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Structured Query Language (SQL) server transformation market is experiencing robust growth, projected to reach $15 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 9.4% from 2025 to 2033. This expansion is fueled by several key drivers. The increasing adoption of cloud-based solutions and the rise of big data analytics are pushing organizations to adopt more efficient and scalable SQL server solutions. Furthermore, the growing demand for real-time data processing and improved data integration capabilities within large enterprises and SMEs is significantly driving market growth. The market segmentation reveals strong demand across various application areas, with large enterprises leading the way due to their greater need for robust and scalable data management infrastructure. Data integration scripts remain a prominent segment, highlighting the critical need for seamless data flow across diverse systems. The competitive landscape is marked by established players like Oracle, IBM, and Microsoft, alongside emerging innovative companies specializing in cloud-based SQL server technologies. Geographic analysis suggests North America and Europe currently hold the largest market share, but significant growth potential exists in the Asia-Pacific region, driven by rapid digital transformation and economic growth in countries like India and China. The restraints on market growth are primarily related to the complexities involved in migrating existing legacy systems to new SQL server solutions, along with the need for skilled professionals to manage and optimize these systems. However, the ongoing advancements in automation tools and the increased availability of training programs are mitigating these challenges. The future trajectory of the market indicates continued growth, driven by emerging technologies such as AI-powered query optimization, enhanced security features, and the growing adoption of serverless architectures. This will lead to a wider adoption of SQL server transformation across various sectors, including finance, healthcare, and retail, as organizations seek to leverage data to gain competitive advantage and improve operational efficiency. The market is ripe for innovation and consolidation, with opportunities for both established players and new entrants to capitalize on this ongoing transformation.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Analytics Query Accelerator (AQA) market is experiencing robust growth, driven by the increasing demand for faster and more efficient data analysis across various industries. The market, estimated at $15 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $50 billion by 2033. This growth is fueled by several key factors. Firstly, the exponential growth of data volume necessitates faster query processing times, making AQAs indispensable for businesses aiming to gain real-time insights. Secondly, the rising adoption of cloud-based analytics platforms and big data technologies creates a fertile ground for AQA solutions. Furthermore, the increasing need for advanced analytics capabilities in sectors such as finance, healthcare, and e-commerce is further driving market expansion. Finally, continuous technological advancements, including the development of more powerful processors and optimized algorithms, are improving AQA performance and expanding their application across various use cases. However, the market also faces certain challenges. High initial investment costs and the complexity of implementation can hinder adoption, particularly among smaller businesses. Furthermore, the need for skilled professionals to manage and maintain AQA systems poses another barrier. Despite these restraints, the long-term outlook for the AQA market remains extremely positive. The ongoing trend toward data-driven decision-making and the continuous evolution of data analytics technologies are expected to propel significant growth in the coming years. Market segmentation reveals strong growth in the cloud-based application segment and a rising demand for AI-powered AQAs. Geographically, North America and Europe currently dominate the market, but Asia-Pacific is anticipated to show rapid growth, driven by increased digitalization and technological advancements.
In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Weitere Informationen
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm
RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/
RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.
This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.
The following tables are included in the RxNorm dataset:
RXNCONSO contains concept and source information
RXNREL contains information regarding relationships between entities
RXNSAT contains attribute information
RXNSTY contains semantic information
RXNSAB contains source info
RXNCUI contains retired rxcui codes
RXNATOMARCHIVE contains archived data
RXNCUICHANGES contains concept changes
Update Frequency: Monthly
Fork this kernel to get started with this dataset.
https://www.nlm.nih.gov/research/umls/rxnorm/
https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm
https://cloud.google.com/bigquery/public-data/rxnorm
Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.
Banner Photo by @freestocks from Unsplash.
What are the RXCUI codes for the ingredients of a list of drugs?
Which ingredients have the most variety of dose forms?
In what dose forms is the drug phenylephrine found?
What are the ingredients of the drug labeled with the generic code number 072718?
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Hacker News corpus, 2007-Nov 2022
Dataset Description
Dataset Summary
Dataset Name: Hacker News Full Corpus (2007 - November 2022) Description:
NOTE: I am not affiliated with Y Combinator.
This dataset is a July 2023 snapshot of YCombinator's BigQuery dump of the entire archive of posts and comments made on Hacker News. It contains posts from Hacker News' inception in 2007 through to November 16, 2022, when the BigQuery database was last updated. The dataset… See the full description on the dataset page: https://huggingface.co/datasets/jkeisling/hacker-news-corpus-2007-2022.
In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Scopri di più
This database contains the data reported in the Annual Homeless Assessment Report to Congress (AHAR). It represents a point-In-time count (PIT) of homeless individuals, as well as a housing inventory count (HIC) conducted annually.
The data represent the most comprehensive national-level assessment of homelessness in America, including PIT and HIC estimates of homelessness, as well as estimates of chronically homeless persons, homeless veterans, and homeless children and youth.
These data can be trended over time and correlated with other metrics of housing availability and affordability, in order to better understand the particular type of housing resources that may be needed from a social determinants of health perspective.
HUD captures these data annually through the Continuum of Care (CoC) program. CoC-level reporting data have been crosswalked to county levels for purposes of analysis of this dataset.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.sdoh_hud_pit_homelessness
What has been the change in the number of homeless veterans in the state of New York’s CoC Regions since 2012? Determine how the patterns of homeless veterans have changes across the state of New York
homeless_2018 AS (
SELECT Homeless_Veterans AS Vet18, CoC_Name
FROM bigquery-public-data.sdoh_hud_pit_homelessness.hud_pit_by_coc
WHERE SUBSTR(CoC_Number,0,2) = "NY" AND Count_Year = 2018
),
veterans_change AS ( SELECT homeless_2012.COC_Name, Vet12, Vet18, Vet18 - Vet12 AS VetChange FROM homeless_2018 JOIN homeless_2012 ON homeless_2018.CoC_Name = homeless_2012.CoC_Name )
SELECT * FROM veterans_change
In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.詳細