Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains results of various metric tests performed in the SPARQL query engine nLDE: the network of Linked Data Eddies, in different configurations. The queries themselves are available via the nLDE website and tests are explained in depth within the associated publication.To compute the diefficiency metrics dief@t and dief@k, we need the answer trace produced by the SPARQL query engines when executing queries. Answer traces record the exact point in time when an engine produces an answer when executing a query.We executed SPARQL queries using three different configurations of the nLDE engine: Selective, NotAdaptive, Random. The resulting answer trace for each query execution is stored in the CSV file nLDEBenchmark1AnswerTrace.csv. The structure of this file is as follows: query: id of the query executed. Example: 'Q9.sparql'approach: name of the approach (or engine) used to execute the query.tuple: the value i indicates that this row corresponds to the ith answer produced by approach when executing query.time: elapsed time (in seconds) since approach started the execution of query until the answer i is produced.In addition, to compare the performance of the nLDE engine using the metrics dief@t and dief@k as well as conventional metrics used in the query processing literature, such as: execution time, time for the first tuple, and number of answers produced. We measured the performance of the nLDE engine using conventional metrics. The results are available at the CSV file inLDEBenchmark1Metrics. The structure of this CSV file is as follows:query: id of the query executed. Example: 'Q9.sparql'approach: name of the approach (or engine) used to execute the query.tfft: time (in seconds) required by approach to produce the first tuple when executing query.totaltime: elapsed time (in seconds) since approach started the execution of query until the last answer of query is produced.comp: number of answers produced by approach when executing query.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: Clean version of the execute console, to renew a used one
Facebook
TwitterThis dataset contains executions of each query from the Join Order Benchmark on PostgreSQL 10.5. The data is loaded, appropriate indexes are created, and each query is ran once with a cold cache. The format of each file is the output of pre-pending EXPLAIN (FORMAT JSON, ANALYZE) to each SQL query. Queries are executed in a VirtualBox VM created by Vagrant. The VM has two cores and 8GB of memory. The PostgreSQL buffer size is set to 4GB. The VM is ran on a machine with 16GB of RAM and a Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The "Wikipedia SQLite Portable DB" is a compact and efficient database derived from the Kensho Derived Wikimedia Dataset (KDWD). This dataset provides a condensed subset of raw Wikimedia data in a format optimized for natural language processing (NLP) research and applications.
I am not affiliated or partnered with the Kensho in any way, just really like the dataset for giving my agents to query easily.
Key Features:
Contains over 5 million rows of data from English Wikipedia and Wikidata Stored in a portable SQLite database format for easy integration and querying Includes a link-annotated corpus of English Wikipedia pages and a compact sample of the Wikidata knowledge base Ideal for NLP tasks, machine learning, data analysis, and research projects
The database consists of four main tables:
This dataset is derived from the Kensho Derived Wikimedia Dataset (KDWD), which is built from the English Wikipedia snapshot from December 1, 2019, and the Wikidata snapshot from December 2, 2019. The KDWD is a condensed subset of the raw Wikimedia data in a form that is helpful for NLP work, and it is released under the CC BY-SA 3.0 license. Credits: The "Wikipedia SQLite Portable DB" is derived from the Kensho Derived Wikimedia Dataset (KDWD), created by the Kensho R&D group. The KDWD is based on data from Wikipedia and Wikidata, which are crowd-sourced projects supported by the Wikimedia Foundation. We would like to acknowledge and thank the Kensho R&D group for their efforts in creating the KDWD and making it available for research and development purposes. By providing this portable SQLite database, we aim to make Wikipedia data more accessible and easier to use for researchers, data scientists, and developers working on NLP tasks, machine learning projects, and other data-driven applications. We hope that this dataset will contribute to the advancement of NLP research and the development of innovative applications utilizing Wikipedia data.
https://www.kaggle.com/datasets/kenshoresearch/kensho-derived-wikimedia-data/data
Tags: encyclopedia, wikipedia, sqlite, database, reference, knowledge-base, articles, information-retrieval, natural-language-processing, nlp, text-data, large-dataset, multi-table, data-science, machine-learning, research, data-analysis, data-mining, content-analysis, information-extraction, text-mining, text-classification, topic-modeling, language-modeling, question-answering, fact-checking, entity-recognition, named-entity-recognition, link-prediction, graph-analysis, network-analysis, knowledge-graph, ontology, semantic-web, structured-data, unstructured-data, data-integration, data-processing, data-cleaning, data-wrangling, data-visualization, exploratory-data-analysis, eda, corpus, document-collection, open-source, crowdsourced, collaborative, online-encyclopedia, web-data, hyperlinks, categories, page-views, page-links, embeddings
Usage with LIKE queries: ``` import aiosqlite import asyncio
class KenshoDatasetQuery: def init(self, db_file): self.db_file = db_file
async def _aenter_(self):
self.conn = await aiosqlite.connect(self.db_file)
return self
async def _aexit_(self, exc_type, exc_val, exc_tb):
await self.conn.close()
async def search_pages_by_title(self, title):
query = """
SELECT pages.page_id, pages.item_id, pages.title, pages.views,
items.labels AS item_labels, items.description AS item_description,
link_annotated_text.sections
FROM pages
JOIN items ON pages.item_id = items.id
JOIN link_annotated_text ON pages.page_id = link_annotated_text.page_id
WHERE pages.title LIKE ?
"""
async with self.conn.execute(query, (f"%{title}%",)) as cursor:
return await cursor.fetchall()
async def search_items_by_label_or_description(self, keyword):
query = """
SELECT id, labels, description
FROM items
WHERE labels LIKE ? OR description LIKE ?
"""
async with self.conn.execute(query, (f"%{keyword}%", f"%{keyword}%")) as cursor:
return await cursor.fetchall()
async def search_items_by_label(self, label):
query = """
SELECT id, labels, description
FROM items
WHERE labels LIKE ?
"""
async with self.conn.execute(query, (f"%{label}%",)) as cursor:
return await cursor.fetchall()
async def search_properties_by_label_or_desc...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: infrared execute
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Physical Properties of Rivers: Querying Metadata and Discharge Data
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the second part of a two-part exercise focusing on the physical properties of rivers.
Introduction
Rivers are bodies of freshwater flowing from higher elevations to lower elevations due to the force of gravity. One of the most important physical characteristics of a stream or river is discharge, the volume of water moving through the river or stream over a given amount of time. Discharge can be measured directly by measuring the velocity of flow in several spots in a stream and multiplying the flow velocity over the cross-sectional area of the stream. However, this method is effort-intensive. This exercise will demonstrate how to approximate discharge by developing a rating curve for a stream at a given sampling point. You will also learn to query metadata from and compare discharge patterns in climatically different regions of the United States.
Learning Objectives
After successfully completing this exercise, you will be able to:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benchmark Execution with- 250 Feasible Queries- 400M Triple DBpedia Dataset- 16 querying user- 16 update user- 250 changesets
Facebook
TwitterIn the past, the majority of data analysis use cases was addressed by aggregating relational data. Since a few years, a trend is evolving, which is called “Big Data” and which has several implications on the field of data analysis. Compared to previous applications, much larger data sets are analyzed using more elaborate and diverse analysis methods such as information extraction techniques, data mining algorithms, and machine learning methods. At the same time, analysis applications include data sets with less or even no structure at all. This evolution has implications on the requirements on data processing systems. Due to the growing size of data sets and the increasing computational complexity of advanced analysis methods, data must be processed in a massively parallel fashion. The large number and diversity of data analysis techniques as well as the lack of data structure determine the use of user-defined functions and data types. Many traditional database systems are not flexible enough to satisfy these requirements. Hence, there is a need for programming abstractions to define and efficiently execute complex parallel data analysis programs that support custom user-defined operations. The success of the SQL query language has shown the advantages of declarative query specification, such as potential for optimization and ease of use. Today, most relational database management systems feature a query optimizer that compiles declarative queries into physical execution plans. Cost-based optimizers choose from billions of plan candidates the plan with the least estimated cost. However, traditional optimization techniques cannot be readily integrated into systems that aim to support novel data analysis use cases. For example, the use of user-defined functions (UDFs) can significantly limit the optimization potential of data analysis programs. Furthermore, lack of detailed data statistics is common when large amounts of unstructured data is analyzed. This leads to imprecise optimizer cost estimates, which can cause sub-optimal plan choices. In this thesis we address three challenges that arise in the context of specifying and optimizing data analysis programs. First, we propose a parallel programming model with declarative properties to specify data analysis tasks as data flow programs. In this model, data processing operators are composed of a system-provided second-order function and a user-defined first-order function. A cost-based optimizer compiles data flow programs specified in this abstraction into parallel data flows. The optimizer borrows techniques from relational optimizers and ports them to the domain of general-purpose parallel programming models. Second, we propose an approach to enhance the optimization of data flow programs that include UDF operators with unknown semantics. We identify operator properties and conditions to reorder neighboring UDF operators without changing the semantics of the program. We show how to automatically extract these properties from UDF operators by leveraging static code analysis techniques. Our approach is able to emulate relational optimizations such as filter and join reordering and holistic aggregation push-down while not being limited to relational operators. Finally, we analyze the impact of changing execution conditions such as varying predicate selectivities and memory budgets on the performance of relational query plans. We identify plan patterns that cause significantly varying execution performance for changing execution conditions. Plans that include such risky patterns are prone to cause problems in presence of imprecise optimizer estimates. Based on our findings, we introduce an approach to avoid risky plan choices. Moreover, we present a method to assess the risk of a query execution plan using a machine-learned prediction model. Experiments show that the prediction model outperforms risk predictions which are computed from optimizer estimates.
Facebook
Twitter
According to our latest research, the global Query Plan Optimization with LLMs market size reached USD 1.62 billion in 2024, demonstrating robust momentum driven by the accelerating adoption of AI-powered database solutions. The market is expected to expand at a healthy CAGR of 18.7% from 2025 to 2033, reaching a forecasted market size of USD 8.09 billion by 2033. Key growth factors include the rising complexity of data environments, the increasing need for efficient data management, and the rapid integration of Large Language Models (LLMs) into enterprise-level query optimization processes.
One of the primary growth drivers for the Query Plan Optimization with LLMs market is the exponential increase in data volumes across industries. Organizations are continually generating and storing vast amounts of structured and unstructured data, which necessitates advanced techniques for efficient data retrieval and processing. Traditional query optimization methods often struggle to keep pace with the scale and complexity of modern data environments. The integration of LLMs into query plan optimization processes introduces a new paradigm, leveraging advanced natural language processing and machine learning capabilities to interpret, optimize, and execute complex queries with greater speed and accuracy. This technological shift is enabling organizations to extract actionable insights from their data repositories more efficiently, thereby enhancing decision-making and operational performance.
Another significant factor propelling the market growth is the increasing adoption of cloud-based data solutions and the proliferation of hybrid IT architectures. As enterprises migrate their workloads to the cloud, the need for scalable, intelligent, and automated query optimization tools becomes paramount. LLM-powered query optimizers are particularly well-suited to cloud environments, where dynamic resource allocation and multi-tenant architectures present unique challenges for query execution. By automating the selection and tuning of optimal query plans, these solutions reduce latency, improve throughput, and lower operational costs. Furthermore, the flexibility of cloud deployment models allows organizations of all sizes to leverage advanced query optimization capabilities without substantial upfront investments in infrastructure.
The growing emphasis on business intelligence, real-time analytics, and data-driven strategies across sectors further amplifies the demand for efficient query plan optimization. Enterprises are increasingly relying on advanced analytics platforms to gain competitive advantages, and LLMs offer a transformative approach to query optimization by understanding the intent behind queries and adapting to evolving data schemas. This adaptability is particularly valuable in sectors such as finance, healthcare, and retail, where data requirements are complex and constantly changing. The ability of LLMs to learn from historical query patterns and continuously improve optimization strategies positions them as a vital component in the modern data stack, fueling sustained market growth.
Regionally, North America remains at the forefront of the Query Plan Optimization with LLMs market, supported by a mature IT infrastructure, significant investments in AI research, and the presence of leading technology vendors. However, Asia Pacific is emerging as a high-growth region, driven by rapid digital transformation, expanding cloud adoption, and government initiatives supporting AI innovation. Europe also demonstrates strong potential, particularly in sectors such as finance and manufacturing, where regulatory compliance and data privacy are critical considerations. The market landscape is further diversified by the increasing participation of Latin America and the Middle East & Africa, where digitalization efforts are gaining momentum and creating new opportunities for advanced database solutions.
Within the Query Plan O
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset provides an archive list of vendors and contracts to be executed by the Office of the Chief Procurement Officer. Data starting in June 2016 can be found at https://datacatalog.cookcountyil.gov/Finance-Administration/Procurement-Intent-to-Execute/ag43-fvd7
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Cost-Aware Query Optimizers market size reached USD 1.14 billion in 2024, reflecting robust momentum driven by the increasing complexity of data environments and the urgent need for efficient query processing. The market is expected to expand at a remarkable CAGR of 13.7% from 2025 to 2033, reaching a forecasted size of USD 3.41 billion by 2033. The primary growth factor is the surge in data-driven decision-making, compelling enterprises to adopt advanced query optimization solutions that minimize operational costs and maximize performance.
A significant growth driver for the Cost-Aware Query Optimizers market is the explosive proliferation of big data and the increasing adoption of cloud-based data platforms across diverse industries. As organizations generate and store massive volumes of data, the complexity and cost of managing, querying, and analyzing this data have grown exponentially. Cost-aware query optimizers address these challenges by intelligently analyzing multiple execution plans and selecting the most cost-effective approach, thereby reducing resource consumption and accelerating query performance. The growing reliance on real-time analytics and the need for scalable data warehousing solutions further amplify the demand for these sophisticated optimization tools. As enterprises continue to invest in digital transformation initiatives, the integration of cost-aware optimization mechanisms becomes vital for maintaining operational efficiency and competitiveness.
Another major factor propelling the market growth is the increasing demand for business intelligence (BI) and analytics solutions. Organizations across sectors such as BFSI, healthcare, retail, and manufacturing are leveraging BI platforms to extract actionable insights from their data repositories. However, as the volume and complexity of queries increase, so does the cost associated with processing them, particularly in cloud environments where compute and storage resources are billed on a usage basis. Cost-aware query optimizers play a critical role in minimizing these expenses by automatically choosing the least expensive query execution paths, ensuring that businesses can maintain high performance without incurring prohibitive costs. This capability is especially crucial for enterprises with dynamic workloads and fluctuating query demands, as it enables them to optimize resource utilization and control operational expenditures.
The evolution of hybrid and multi-cloud architectures has also been instrumental in shaping the Cost-Aware Query Optimizers market. As organizations increasingly adopt hybrid data environments to balance performance, security, and cost, the complexity of query execution and optimization grows. Cost-aware query optimizers are designed to operate seamlessly across heterogeneous data platforms, enabling organizations to optimize queries irrespective of where their data resides. This flexibility is particularly valuable for global enterprises with distributed data infrastructures, as it allows them to maintain consistent query performance and cost efficiency across on-premises and cloud deployments. The integration of AI and machine learning into query optimizers further enhances their ability to predict query costs and adapt to changing workloads, making them indispensable tools for modern data-driven enterprises.
From a regional perspective, North America continues to dominate the Cost-Aware Query Optimizers market, accounting for the largest share in 2024. This leadership is attributed to the region's advanced IT infrastructure, high adoption rate of cloud technologies, and strong presence of leading technology vendors. Europe and Asia Pacific follow closely, with Asia Pacific expected to register the fastest CAGR during the forecast period, fueled by rapid digitalization and the expansion of data-centric industries. Latin America and the Middle East & Africa are also witnessing increased adoption, driven by the growing need for efficient data management solutions in emerging markets. The global landscape is characterized by a rising awareness of the benefits of cost-aware optimization, prompting organizations across all regions to invest in advanced query optimization technologies.
The component segment of the Cost-Aware Query Optimizers market is bifurcated into software and services, each pl
Facebook
Twitter
According to our latest research, the global SQL Query Audit Tools market size reached USD 1.42 billion in 2024, driven by the surging demand for robust database security and regulatory compliance across industries. The market is projected to grow at a CAGR of 13.8% from 2025 to 2033, reaching an estimated USD 4.13 billion by 2033. This remarkable growth is primarily fueled by the increasing sophistication of cyber threats, the proliferation of data privacy regulations, and the rapid digital transformation initiatives adopted by enterprises worldwide.
One of the most significant growth factors for the SQL Query Audit Tools market is the mounting pressure on organizations to ensure data security and compliance with stringent regulatory frameworks such as GDPR, HIPAA, SOX, and PCI DSS. As data breaches and insider threats become more prevalent, businesses are investing heavily in advanced SQL audit solutions to monitor, detect, and respond to suspicious database activities in real time. These tools provide comprehensive visibility into SQL queries, user access patterns, and configuration changes, thereby enabling organizations to proactively mitigate risks and avoid costly penalties. Furthermore, the increasing adoption of cloud-based databases and hybrid IT environments has created new vulnerabilities, intensifying the need for robust SQL query auditing capabilities across diverse deployment models.
Another key driver propelling the market is the growing emphasis on performance optimization and operational efficiency within database management. Organizations are increasingly leveraging SQL query audit tools not only for security but also for performance monitoring, anomaly detection, and resource utilization analysis. By continuously auditing and analyzing SQL queries, businesses can identify bottlenecks, optimize query execution, and ensure high availability of mission-critical applications. The integration of artificial intelligence and machine learning into these tools further enhances their ability to detect complex threats, automate compliance reporting, and provide actionable insights for database administrators. This convergence of security, compliance, and performance monitoring functionalities is expected to accelerate the adoption of SQL query audit tools across various industry verticals.
The proliferation of digital transformation initiatives across sectors such as BFSI, healthcare, retail, and government is also contributing to the robust growth of the SQL Query Audit Tools market. As organizations migrate to cloud-based infrastructures and embrace data-driven decision-making, the volume, velocity, and variety of data being stored and processed in SQL databases are increasing exponentially. This surge in data complexity necessitates advanced auditing solutions capable of scaling with enterprise needs while maintaining stringent security and compliance standards. Additionally, the rise of remote work and distributed teams has expanded the attack surface, further emphasizing the importance of comprehensive SQL query auditing to safeguard sensitive information and maintain operational resilience.
From a regional perspective, North America currently dominates the SQL Query Audit Tools market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The strong presence of major technology vendors, early adoption of advanced cybersecurity solutions, and a highly regulated business environment have positioned North America at the forefront of market growth. Meanwhile, Asia Pacific is anticipated to witness the highest CAGR over the forecast period, driven by rapid digitalization, expanding IT infrastructure, and increasing awareness of data privacy issues among enterprises in emerging economies. The Middle East & Africa and Latin America are also expected to experience steady growth as organizations in these regions prioritize database security and compliance in response to evolving cyber threats and regulatory mandates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: Cost to Execute Popular Transactions
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
This dataset is composed of the 28 SPARQL queries executed to generate the measurement tables which are included in the files belonging to dataset containing the data tables results of the queries execution. They have the same name. They only differ by their extension. By example, CWG_reception_fallingNumber_raw.sparql is the file including the SPARQL query executed to obtain the table included in the file CWG_reception_fallingNumber_raw.tsv.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: v0.7 Entrypoint execute() calls with gas allocated much higher than gas used
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the SQL Query Optimization with AI market size reached USD 1.32 billion in 2024, propelled by the rapid adoption of artificial intelligence in database management and analytics. The market is projected to grow at a robust CAGR of 22.1% from 2025 to 2033, reaching a forecasted value of USD 9.85 billion by 2033. This remarkable growth is primarily driven by the increasing need for real-time data processing, the proliferation of complex data environments, and the demand for enhanced application performance across industries.
The surge in digital transformation initiatives across various sectors is one of the most significant growth factors for the SQL Query Optimization with AI market. Enterprises are increasingly relying on data-driven decision-making, which necessitates efficient and scalable database systems. AI-powered SQL query optimization tools help organizations streamline query execution, reduce latency, and maximize resource utilization. With the explosion of big data and the adoption of cloud-based infrastructures, businesses are seeking advanced solutions that can automate query tuning, detect anomalies, and dynamically adapt to changing workloads. The integration of machine learning algorithms into SQL optimization processes is enabling predictive analytics, self-healing databases, and automated performance tuning, further fueling market expansion.
Another key driver is the escalating complexity of enterprise data ecosystems. Organizations today manage vast volumes of structured and unstructured data from multiple sources, including IoT devices, transactional systems, and external APIs. As data environments grow more intricate, manual query optimization becomes increasingly impractical and error-prone. AI-driven SQL optimization platforms address these challenges by continuously monitoring query performance, identifying bottlenecks, and suggesting optimal execution plans. This not only improves database efficiency but also reduces the burden on database administrators, allowing them to focus on higher-value tasks. The growing adoption of hybrid and multi-cloud strategies is also contributing to the demand for intelligent query optimization solutions that ensure consistent performance across diverse environments.
Furthermore, the rise of regulatory compliance requirements and data privacy concerns is pushing organizations to invest in advanced database management solutions. AI-powered SQL query optimization tools can help ensure data integrity, minimize risks, and maintain compliance with industry standards such as GDPR, HIPAA, and PCI DSS. By automating query auditing, access control, and anomaly detection, these solutions enhance security and transparency in data operations. The increasing emphasis on customer experience, operational agility, and cost optimization is prompting enterprises to adopt AI-enabled query optimization as a strategic differentiator, driving sustained growth in the market.
From a regional perspective, North America currently dominates the SQL Query Optimization with AI market, accounting for the largest revenue share due to the presence of leading technology vendors, early adoption of AI, and a mature IT infrastructure. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid digitalization, expanding cloud adoption, and the emergence of data-centric business models in countries like China, India, and Japan. Europe is also experiencing steady growth, fueled by stringent data protection regulations and increasing investments in AI-driven database management solutions. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, supported by government initiatives to promote digital transformation and the growing penetration of cloud services.
The Component segment of the SQL Query Optimization with AI market is categorized into Software, Hardware, and Services. Software solutions represent the largest share of the market, as they form the backbone of AI-driven query optimization processes. These include advanced query analyzers, AI-powered database management platforms, and automated performance tuning tools that leverage machine learning algorithms to optimize SQL queries in real time. The proliferation of open-source frameworks and the integration of AI capabilities into existing database manage
Facebook
Twitterhttp://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
This database contains all publications in the field of space biology from 2010 to 2025. It includes publication links, titles, release dates, and author information, going beyond raw content to capture structured data and relationships
The database was built by scraping PubMed Central using Playwright, cleaned with BeautifulSoup, and stored in a SQLite database using Peewee ORM. This setup allows you to query publications, authors, and co-authorship relationships efficiently using SQL. You can find the script that created this DB on GitHub
Several interesting trends were uncovered using this database by running the example queries given at the bottom, we have even pasted the results for you to see on this page.
We encourage you to follow the guide below to get started using SQL for revealing some exciting new insights in Space Biology using our database!
sqlite3 installed to interact with the database On Linux, you can install sqlite3 using-
sudo apt install sqlite3
On Mac-
brew install sqlite
On Windows, one can follow this and further instructions, but it'll be much easier to just install Windows Subsystem for Linux and continue as a Linux user
Once you have navigated to the folder and installed sqlite3, you can now run SQL on the DB, here's a template command-
sqlite3 -header -column space_bio.db "YOUR QUERY;"
Below the DB schema is given, and at the bottom of the page, several example queries are given for you to try out. If you want to execute PeeweeORM queries, you can check out the repo which created this DB for db.py in the database folder, which has detailed model definitions
Pubs – Publications| Column | Type | Description |
|---|---|---|
link | TEXT (Primary Key) | Unique URL or DOI of the publication |
title | TEXT | Title of the publication (unique per DB, but may not be globally unique) |
date | DATE | Publication date (YYYY-MM-DD) |
content | TEXT | Full content of the publication |
Purpose: Store all publication details and content for analysis.
Authors – Authors| Column | Type | Description |
|---|---|---|
name | TEXT (Primary Key) | Full name of the author |
Purpose: List all authors in the dataset.
PubAuthors – Many-to-Many Relationship| Column | Type | Description |
|---|---|---|
publication_id | TEXT (FK → Pubs.link) | Publication identifier |
author_id | TEXT (FK → Authors.name) | Author name |
Primary Key: Composite (publication_id, author_id)
Purpose: Link publications to authors, enabling queries about co-authorships, prolific authors, and collaboration patterns.
sqlite3 -header -column space_bio.db "
SELECT
a.name,
COUNT(pa.publication_id) as pub_count
FROM authors a
JOIN pubauthors pa ON a.name = pa.author_id
GROUP BY a.name
ORDER BY pub_count DESC
LIMIT 5;
"
sqlite3 -header -column space_bio.db "
SELECT
p.title,
COUNT(pa.author_id) as author_count
FROM pubs p
JOIN pubauthors pa ON p.link = pa.publication_id
GROUP BY p.link
ORDER BY author_count DESC
LIMIT 5;
"
sqlite3 -header -column space_bio.db "
SELECT
strftime('%Y', date) as year,
COUNT(*) as pub_count
FROM pubs
GROUP BY year
ORDER BY year;
"
| Name | Pub Count |
|---|---|
| Kasthuri Venkateswaran | 54 |
| Christopher E Mason | 49 |
| Afshin Beheshti | 29 |
| Sylvain V Costes | 29 |
| Nitin K Singh | 24 |
| Title | Author Count |
|---|---|
| The Space Omics and Medical Atlas (SOMA) and international astronaut biobank | 109 |
| Cosmic kidney disease: an integrated pan... |
Facebook
TwitterThe fdt-sqlalchemy extension for CKAN integrates the SQLAlchemy panel from Flask Debug Toolbar. This integration enables developers to inspect SQLAlchemy queries executed during CKAN's operation. The extension facilitates debugging and optimization by providing insights into database interactions. Key Features: SQLAlchemy Query Inspection: Displays a list of SQLAlchemy queries executed during a request. Query Explanation: Provides 'EXPLAIN' output for individual queries, allowing for query performance analysis. (Note: Functionality confirmed only for flask-sqlalchemy~=2.5.0) SELECT Link: Provides a link to view the 'SELECT' query directly for executed queries, simplifying debugging effort. (Note: Functionality confirmed only for flask-sqlalchemy~=2.5.0) Compatibility: Supports CKAN versions 2.9 and 2.10. Requires flask-sqlalchemy~=2.5.0 for full functionality. Technical Integration: The extension is enabled by adding fdt_sqlalchemy to the ckan.plugins setting in the CKAN configuration file. It integrates with Flask Debug Toolbar to present database query information within the toolbar interface. Requires the Flask-SQLAlchemy library for its core functionality. Benefits & Impact: By exposing SQLAlchemy queries within the Flask Debug Toolbar, fdt-sqlalchemy helps CKAN developers identify slow or inefficient database queries. This results in improved application performance through query optimization and a better understanding of CKAN's database interactions.
Facebook
Twitter
According to our latest research, the global cost-based query optimizer service market size reached USD 1.37 billion in 2024, reflecting robust adoption across diverse industries. The market is anticipated to expand at a CAGR of 12.8% from 2025 to 2033, projecting a value of USD 4.13 billion by 2033. This growth is primarily driven by the increasing complexity of data environments and the need for advanced optimization solutions in data management systems.
One of the key growth factors for the cost-based query optimizer service market is the exponential rise in data volume and complexity within modern enterprises. Organizations are generating vast amounts of structured and unstructured data daily, necessitating sophisticated query optimization to ensure efficient data retrieval and processing. The proliferation of cloud-based data platforms and distributed databases has further intensified the need for advanced query optimization services. As businesses strive to extract actionable insights from big data, cost-based query optimizers play a crucial role in minimizing resource consumption and reducing query execution time, thereby improving operational efficiency and decision-making capabilities.
Another significant driver fueling market expansion is the digital transformation initiatives being undertaken by enterprises worldwide. As organizations migrate their workloads to cloud and hybrid environments, the demand for scalable, flexible, and cost-effective query optimization solutions is surging. Cost-based query optimizers, which leverage mathematical models and data statistics to determine the most efficient execution plans, are increasingly being integrated into business intelligence, analytics, and data warehousing applications. This integration is enabling enterprises to enhance performance, reduce infrastructure costs, and deliver superior user experiences, thus accelerating the adoption of these services across various industry verticals.
The rapid advancements in artificial intelligence and machine learning technologies are also contributing to the growth of the cost-based query optimizer service market. Modern query optimizers are increasingly utilizing AI-driven algorithms to adapt to dynamic workloads, predict query performance, and automate tuning processes. These innovations are enabling organizations to handle complex queries more effectively, optimize resource allocation, and achieve higher levels of automation in their data management operations. As the competitive landscape intensifies, vendors are focusing on enhancing the intelligence and adaptability of their query optimizer solutions, further propelling market growth.
From a regional perspective, North America currently dominates the global cost-based query optimizer service market, accounting for the largest revenue share in 2024. This dominance is attributed to the presence of major technology players, early adoption of advanced database technologies, and substantial investments in cloud infrastructure. However, the Asia Pacific region is expected to exhibit the highest growth rate over the forecast period, driven by rapid digitalization, burgeoning IT and telecommunications sectors, and increasing investments in data-driven solutions across emerging economies such as China and India.
The component segment of the cost-based query optimizer service market is bifurcated into software and services. The software sub-segment dominates the market, owing to the widespread deployment of query optimization engines within database management systems, data warehouses, and analytics platforms. These software solutions are designed to analyze query execution plans, estimate resource costs, and select the most efficient strategy for data retrieval. The increasing complexity of enterprise data environments and the proliferation of multi-cloud and hybrid cloud architectures are driving the demand for robust and scalable cost-based query optimizer software, enabling
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Query Plan Optimization with LLMs market size reached USD 1.14 billion in 2024, with a robust year-on-year growth driven by rapid enterprise adoption and AI advancements. The market is projected to expand at a CAGR of 25.7% from 2025 to 2033, reaching a forecasted value of USD 8.95 billion by 2033. This remarkable growth is primarily fueled by the increasing integration of large language models (LLMs) into data management processes, enabling organizations to achieve unprecedented efficiency and accuracy in query optimization.
The primary growth factor for the Query Plan Optimization with LLMs market is the exponential increase in data volumes and complexity across enterprises. As organizations generate and store massive datasets, the need for efficient and intelligent query optimization has become critical. LLMs, with their advanced natural language processing and deep learning capabilities, are revolutionizing traditional query planning by automating the generation, selection, and tuning of query execution plans. This results in significant performance improvements, reduced latency, and lower operational costs. Furthermore, LLM-driven optimization minimizes human intervention, reducing the risk of manual errors and enabling database administrators to focus on higher-value tasks. The convergence of AI and data infrastructure is thus a vital catalyst, accelerating the adoption of LLM-powered query optimization solutions across diverse industries.
Another significant driver is the growing demand for real-time analytics and business intelligence. Enterprises are increasingly leveraging live data streams for mission-critical decision-making, necessitating highly efficient and adaptive query execution. LLMs enable dynamic and context-aware optimization, tailoring query plans to specific workloads, data distributions, and user intents. This adaptability ensures optimal resource utilization and maximizes throughput, even in highly heterogeneous environments. The proliferation of cloud-native architectures and hybrid data ecosystems further amplifies the need for intelligent query optimization, as organizations seek to harmonize performance across on-premises, private, and public cloud platforms. As a result, vendors are heavily investing in LLM-based solutions to deliver differentiated performance and scalability.
The surge in digital transformation initiatives and the adoption of advanced analytics platforms are also propelling market growth. Industries such as BFSI, healthcare, retail, and telecommunications are deploying LLM-powered query optimization tools to enhance data-driven operations, ensure regulatory compliance, and improve customer experiences. These sectors handle complex queries over large, sensitive datasets, making efficient query planning essential for both performance and security. Moreover, the rise of low-code and no-code platforms is democratizing access to sophisticated data management capabilities, allowing non-technical users to benefit from LLM-driven query optimization. This trend is expected to further broaden the market’s addressable base, fostering innovation and competition among solution providers.
From a regional perspective, North America currently dominates the Query Plan Optimization with LLMs market due to its advanced technological infrastructure, strong presence of leading AI companies, and high enterprise cloud adoption rates. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid digitalization, expanding IT investments, and the proliferation of large-scale data centers. Europe is also witnessing substantial growth, fueled by stringent data regulations and increasing focus on data governance. Latin America and the Middle East & Africa, while currently representing smaller market shares, are expected to see accelerated adoption as local enterprises embrace AI-driven digital transformation. The global market is thus characterized by dynamic regional trends, with significant opportunities for vendors to expand their footprint across both mature and emerging economies.
The Component segment of the Query Plan Optimization with LLMs market comprises software, hardware, and services, each playing a distinct role in shaping the industry landscape. The software sub-segment currently commands the largest market share, as LLM-powered query o
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains results of various metric tests performed in the SPARQL query engine nLDE: the network of Linked Data Eddies, in different configurations. The queries themselves are available via the nLDE website and tests are explained in depth within the associated publication.To compute the diefficiency metrics dief@t and dief@k, we need the answer trace produced by the SPARQL query engines when executing queries. Answer traces record the exact point in time when an engine produces an answer when executing a query.We executed SPARQL queries using three different configurations of the nLDE engine: Selective, NotAdaptive, Random. The resulting answer trace for each query execution is stored in the CSV file nLDEBenchmark1AnswerTrace.csv. The structure of this file is as follows: query: id of the query executed. Example: 'Q9.sparql'approach: name of the approach (or engine) used to execute the query.tuple: the value i indicates that this row corresponds to the ith answer produced by approach when executing query.time: elapsed time (in seconds) since approach started the execution of query until the answer i is produced.In addition, to compare the performance of the nLDE engine using the metrics dief@t and dief@k as well as conventional metrics used in the query processing literature, such as: execution time, time for the first tuple, and number of answers produced. We measured the performance of the nLDE engine using conventional metrics. The results are available at the CSV file inLDEBenchmark1Metrics. The structure of this CSV file is as follows:query: id of the query executed. Example: 'Q9.sparql'approach: name of the approach (or engine) used to execute the query.tfft: time (in seconds) required by approach to produce the first tuple when executing query.totaltime: elapsed time (in seconds) since approach started the execution of query until the last answer of query is produced.comp: number of answers produced by approach when executing query.