Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery
This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.
The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.
The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.
Dataset References
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WikiDBs-10k (https://wikidbs.github.io/) is a corpus of relational databases built from Wikidata (https://www.wikidata.org/). This is the preliminary 10k version, the newer version of 100k databases (https://zenodo.org/records/11559814) includes more coherent databases and more diverse table and column names.
The WikiDBs-10k corpus consists of 10,000 databases, for more details read our paper: https://ceur-ws.org/Vol-3462/TADA3.pdf (TaDA@VLDB'23)
Each database is saved in a sub-folder, the table files are provided as csv files and the database schema as a json file.
We thank Till Döhmen and Madelon Hulsebos for generously providing the table statistics from their GitSchemas dataset and Jan-Micha Bodensohn for converting the dataset to SQLite files. This work has been supported by the BMBF and the state of Hesse as part of the NHR Program and the BMBF project KompAKI (grant number 02L19C150), as well as the HMWK cluster project 3AI. Finally, we want to thank hessian.AI, and DFKI Darmstadt for their support.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global relational databases software market size is projected to expand from an estimated $50 billion in 2023 to approximately $85 billion by 2032, growing at a compound annual growth rate (CAGR) of 6%. The primary drivers of this growth include the increasing reliance on data-driven decision-making processes, the surge in big data analytics, and the proliferation of cloud computing technologies. As organizations across various sectors accumulate vast amounts of data, the requirement for efficient data management and storage solutions becomes critical, further propelling the market's expansion.
One of the major growth factors driving the relational databases software market is the exponential increase in data generation from various sources, such as social media, IoT devices, and enterprise applications. With the advent of technologies like machine learning and artificial intelligence, the need to store, retrieve, and analyze massive datasets in real-time has become paramount. Relational databases software offers a structured way to manage data, providing quick access and robust querying capabilities, which are essential for leveraging data insights to drive business strategies.
Another significant growth factor is the widespread adoption of cloud computing. Cloud-based relational database solutions offer numerous advantages over traditional on-premises systems, such as scalability, flexibility, cost-effectiveness, and ease of maintenance. Many organizations are migrating their data management systems to the cloud to benefit from these advantages. Cloud vendors like Amazon Web Services, Microsoft Azure, and Google Cloud are continually enhancing their database offerings, adding advanced features to attract more customers, thereby fueling market growth.
The increasing trend toward digital transformation across various industries also contributes to the market's growth. As businesses strive to stay competitive in the digital age, they are investing heavily in modernizing their IT infrastructure, including their database management systems. Relational databases software enables organizations to handle complex transactions and support high-volume operations efficiently. This capability is particularly crucial for sectors such as banking and finance, healthcare, and retail, where data integrity and availability are critical for operations.
Regionally, North America currently holds the largest market share due to the early adoption of advanced technologies and the presence of major market players. Europe follows closely, with significant investments in digital transformation initiatives. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. This growth can be attributed to the rapid technological advancements, increasing internet penetration, and the growing number of small and medium enterprises in countries like China and India. Governments in these regions are also promoting digital initiatives, further boosting market growth.
The relational databases software market is segmented by deployment mode into on-premises and cloud-based solutions. The on-premises segment, traditionally the dominant mode, involves deploying the database software within an organization's own IT infrastructure. This deployment mode offers stringent control over data security and compliance, making it a preferred choice for industries with critical data privacy concerns, such as banking and government sectors. Despite a gradual shift towards cloud solutions, on-premises deployments continue to be relevant due to these security advantages.
However, the cloud-based deployment mode is experiencing rapid growth and is expected to dominate the market by 2032. Cloud databases offer unparalleled scalability and flexibility, allowing organizations to scale their database capacity up or down based on demand. This elasticity is particularly beneficial for businesses with variable workloads, such as e-commerce platforms during peak shopping seasons. Additionally, cloud databases significantly reduce the need for heavy upfront capital expenditure in IT infrastructure, as they operate on a subscription or pay-as-you-go model, which is financially appealing to many enterprises.
Another factor contributing to the rise of cloud-based databases is the continuous innovation by leading cloud service providers. Companies like Amazon Web Services, Google Cloud Platform, and Microsoft Azure are integrating advanced features such as a
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.
WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.
WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Relational Database Software Market Analysis The global Relational Database Software (RDBMS) market is projected to reach USD 2413 million by 2033, expanding at a CAGR of 9.5% from 2025 to 2033. The growth is driven by factors such as increasing demand for real-time data analytics, growth in cloud computing, and proliferation of IoT devices. Market segments include application (large enterprises, SMEs) and types (cloud-based, on-premises). Notable players include Microsoft, MySQL, Oracle, SAP, and IBM. Key Market Trends The adoption of cloud-based RDBMS is a significant trend, as it offers scalability, flexibility, and cost efficiency. Cloud-based RDBMS enables organizations to access and manage data from anywhere, reducing infrastructure costs and maintenance efforts. Increasing data volumes and the need for real-time data analytics are also driving market growth. Organizations are leveraging RDBMS to handle large datasets, derive insights, and improve decision-making. Additionally, the growing popularity of NoSQL databases for specific use cases presents opportunities for market expansion. Regions such as North America and Europe are expected to maintain a significant market share due to early adoption and technological advancements. Emerging markets in Asia Pacific are also witnessing substantial growth, driven by the increasing demand for data management solutions in various industries.
As of June 2024, the most popular relational database management system (RDBMS) worldwide was Oracle, with a ranking score of *******. Oracle was also the most popular DBMS overall. MySQL and Microsoft SQL server rounded out the top three.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Social network data is often prohibitively expensive to collect, limiting empirical network research. We propose an inexpensive and feasible strategy for network elicitation using Aggregated Relational Data (ARD) - responses to questions of the form "how many of your links have trait $k$?" Our method uses ARD to recover parameters of a network formation model, which permits sampling from a distribution over node- or graph-level statistics. We replicate the results of two field experiments that used network data and draw similar conclusions with ARD alone.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
The Relational Database Software Market size was estimated at USD 21.97 Billion in 2024 and is projected to reach USD 45.23 Billion by 2031, growing at a CAGR of 9.4 % from 2024 to 2031
Global Relational Database Software Market Drivers
Rising Demand for Efficient Data Management: Organizations across industries are generating and collecting ever-increasing volumes of data. This necessitates efficient and secure data management solutions. Relational databases, with their structured format and robust querying capabilities, offer a valuable tool to organize, manage, and analyze this data, leading to increased demand for this software.
Cloud Adoption and Scalability: The proliferation of cloud computing has significantly impacted the relational database market. Cloud-based database solutions offer scalability, flexibility, and reduced IT infrastructure burden for businesses. This makes them particularly attractive for small and medium-sized enterprises (SMEs) and facilitates easier data access for geographically dispersed teams.
Growing Importance of Data Security and Compliance: Data breaches and cyberattacks pose significant threats to businesses. Relational database software vendors are constantly innovating to enhance security features like encryption and access controls. Additionally, stringent data privacy regulations like GDPR and CCPA are driving the need for compliant data storage and management solutions, further propelling the market for secure relational databases.
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Relational Databases Software market is a cornerstone of modern data management, enabling organizations to efficiently store, retrieve, and manipulate structured data. As businesses increasingly rely on data-driven decisions, relational databases serve a critical role across various industries, including finance
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A full dump of OCTOPUS PostgreSQL database v.2.1 as published upon
a sematic database redesign (effective db v.2),
the creation of a fully relational PostgreSQL database that uses the PostGIS spatial extension (effective db v.2),
moving the database to GCP (effective db v.2),
fostered FAIR, OPEN and CARE principles implementation (effective db v.2),
the introduction of 'SahulSed' replacing 'OSL/TL Australia' (effective v.2(1)),
the integration of the 'FosSahul' partner collection (effective v.2(1)),
the integration of the 'ExpAge' partner collection (effective v.2(1)),
major upgrades to the 'CRN INT' and 'CRN AUS' collections (effective v.2(2)),
the integration of the 'SahulArch' collection (v.2.1(2)).
Accompanying publication: Codilean, A. T., Munack, H., Saktura, W. M., Cohen, T. J., Jacobs, Z., Ulm, S., Hesse, P. P., Heyman, J., Peters, K. J., Williams, A. N., Saktura, R. B. K., Rui, X., Chishiro-Dennelly, K., and Panta, A.: OCTOPUS database (v.2), Earth Syst. Sci. Data, 14, 3695–3713, https://doi.org/10.5194/essd-14-3695-2022, 2022.
The numbers of single perpetrator relationships (unique count) are counted once for each relationship category. Perpetrators with two or more relationships are counted in the multiple relationship category. Numbers are for the most recent federal fiscal year for which data are available. To view more National Child Abuse and Neglect Data System (NCANDS) findings, click link to summary page below: https://healthdata.gov/stories/s/kaeg-w7jc
Stores physical and logical information about relational databases and record structures to assist in data identification and management.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A full dump of OCTOPUS PostgreSQL database v.2.2 as published upon
the integration of the 'SahulChar' collection (v.2.2(1)),
the integration of the 'IPPD' collection (v.2.2(1)),
major upgrades to the 'CRN INT' and 'CRN AUS' collections (effective v.2.2(3)),
upgrades to the 'CRN XXL' and 'CRN UOW' collections (effective v.2.2(3)).
Database frontend: https://octopusdata.org/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains two files created for the experiments presented in the article: Publication and Maintenance of RDB2RDF Views Externally Materialized in Enterprise Knowledge Graphs.
mapR2RML_MusicBrainz_completo.txt: We created the R2RML mapping for translating MBD data into the Music Ontology vocabulary, which is used for publishing the LMB view. The LMB view was materialized using the D2RQ tool. It took 67 minutes to materialize the view with approximately 41.1 GB of NTriples. We also provided SPARQL endpoint for querying LMB View.
TriggersAndProcedures.txt: We created the triggers, procedures, and class in java to implement the rules required to compute and publish the changesets.
relationalViewDefinition.pdf: This document gives details about the process of creating the relational views used in the experiments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comes from a real world manufacturing process of a Critical Manufacturing business partner. The manufacturing process is monitored via a IoT system. The dataset has been carefully anonymized due to privacy concerns, for more details on how this process was conducted see the accompanying thesis.In the case of the process that generates this data eight different readings are taken each time a particular tool is used. Eventually once a tool begins underperforming, it is retired and therefore does not again again appear in the dataset. We believe that this dataset may be used to estimate and predict tool longevity, as it likely presents time dependent covariates as such be of use to the research of multilevel survival analysis or predictive maintenance models.Name |Type |Description--------------------------|---------------------|---------OperationEndTime |Numerical |Difference in seconds from the first operation in the dataset.ToolId |Numerical Key |The tool used. It’s value is unique to each different tool in the dataset.Machine |Numeric |A categorical variable, representing the machine that used the tool. It’s value is unique to each different machine in the dataset.Process |Numeric |A categorical variable, representing the process that used the tool. It’s value is unique to each different process in the dataset.P1DataPoint1 |Numeric |A concrete value for a reading of parameter one.P1DataPoint2 |Numeric |A concrete value for an error metric associated with the process that generated the value present on P1DataPoint1.P2DataPoint1 |Numeric |A concrete value for a reading of parameter two.P2DataPoint2 |Numeric |A concrete value for an error metric associated with the process that generated the value present on P1DataPoint2.... |... |...P8DataPoint1 |Numeric |A concrete value for a reading of parameter eight.P8DataPoint2 |Numeric |A concrete value for an error metric associated with the process that generated the value present on P1DataPoint8.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size for non-relational databases is expected to grow from USD 10.5 billion in 2023 to USD 35.2 billion by 2032, registering a Compound Annual Growth Rate (CAGR) of 14.6% over the forecast period. This substantial growth is primarily driven by increasing demand for scalable, flexible database solutions capable of handling diverse data types and large volumes of data generated across various industries.
One of the significant growth factors for the non-relational databases market is the exponential increase in data generated globally. With the proliferation of Internet of Things (IoT) devices, social media platforms, and digital transactions, the volume of semi-structured and unstructured data is growing at an unprecedented rate. Traditional relational databases often fall short in efficiently managing such data types, making non-relational databases a preferred choice. For example, document-oriented databases like MongoDB allow for the storage of JSON-like documents, offering flexibility in data modeling and retrieval.
Another key driver is the increasing adoption of non-relational databases among enterprises seeking agile and scalable database solutions. The need for high-performance applications that can scale horizontally and handle large volumes of transactions is pushing businesses to shift from traditional relational databases to non-relational databases. This is particularly evident in sectors like e-commerce, where the ability to manage customer data, product catalogs, and transaction histories in real-time is crucial. Additionally, companies in the BFSI (Banking, Financial Services, and Insurance) sector are leveraging non-relational databases for fraud detection, risk management, and customer relationship management.
The advent of cloud computing and the growing trend of digital transformation are also significant contributors to the market growth. Cloud-based non-relational databases offer numerous advantages, including reduced infrastructure costs, scalability, and ease of access. As more organizations migrate their operations to the cloud, the demand for cloud-based non-relational databases is set to rise. Moreover, the availability of Database-as-a-Service (DBaaS) offerings from major cloud providers like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) is simplifying the deployment and management of these databases, further driving their adoption.
Regionally, North America holds the largest market share, driven by the early adoption of advanced technologies and the presence of major market players. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. The rapid digitalization, growing adoption of cloud services, and increasing investments in IT infrastructure in countries like China and India are propelling the demand for non-relational databases in the region. Additionally, the expanding e-commerce sector and the proliferation of smart devices are further boosting market growth in Asia Pacific.
The non-relational databases market is segmented into several types, including Document-Oriented Databases, Key-Value Stores, Column-Family Stores, Graph Databases, and Others. Each type offers unique functionalities and caters to specific use cases, making them suitable for different industry requirements. Document-Oriented Databases, such as MongoDB and CouchDB, store data in document format (e.g., JSON or BSON), allowing for flexible schema designs and efficient data retrieval. These databases are widely used in content management systems, e-commerce platforms, and real-time analytics applications due to their ability to handle semi-structured data.
Key-Value Stores, such as Redis and Amazon DynamoDB, store data as key-value pairs, providing extremely fast read and write operations. These databases are ideal for caching, session management, and real-time applications where speed is critical. They offer horizontal scalability and are highly efficient in managing large volumes of data with simple query requirements. The simplicity of the key-value data model and its performance benefits make it a popular choice for high-throughput applications.
Column-Family Stores, such as Apache Cassandra and HBase, store data in columns rather than rows, allowing for efficient storage and retrieval of large datasets. These databases are designed to handle massive amounts of data across distributed systems, making them suitable for use cases involving big data analytics, time-seri
Distance-power relationship data we used to create and evaluate a protocol to estimate population density, which can be used to compute abundance of terrestrial sound-producing animals from single automatic acoustic recorders and using an automatic detection algorithm. First posted - January 18, 2017 (available from author) Revised - August 22, 2018 (version 1.1)
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global public cloud non-relational databases and NoSQL database market is projected to reach $24,908.32 million by 2033, exhibiting a CAGR of 16.8% during the forecast period (2023-2033). Factors such as the increasing adoption of cloud-based technologies, surging demand for data analytics, and growing need for flexible and scalable databases are driving the market growth. The key types of NoSQL databases include key-value storage, column storage, document database, and graph database. Among these, the key-value storage database segment currently holds the largest market share due to its simplicity, speed, and scalability. Regionally, North America is expected to dominate the market throughout the forecast period, owing to the high adoption of cloud-based technologies and presence of leading technology companies. However, the Asia Pacific region is anticipated to exhibit the highest growth rate during the forecast period, driven by the increasing demand for data analytics solutions and growing awareness of NoSQL databases. Key players in the market include IBM, MongoDB Inc, AWS, Apache Software Foundation, Neo Technologies (Pty) Ltd, InterSystems, Google, Oracle Corporation, Teradata, DataStax, and Software AG. These companies are focusing on innovation and partnerships to expand their market presence and meet the evolving needs of customers.
https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
Global In-memory database market is expected to revenue of around USD 36.21 billion by 2032, growing at a CAGR of 19.2% between 2024 and 2032.
The “Relational Data between Parties to the UN Framework Convention on Climate Change” (Relational UNFCCC Data) dataset contains dyadic data on how parties to the UNFCCC (i.e. member states or coalitions of member states) react to other parties’ oral interventions during the negotiations. Each observation in the dataset consists of a bargaining interaction between a country dyad, at a specific negotiation day, about a specific negotiation topic. The dataset includes 62 097 dyadic bargaining interactions among the 222 participants to the UNFCCC negotiations (including countries and coalitions) over a series of 461 negotiation days between 1995 and 2013. The dataset was obtained by hand-coding the Earth Negotiation Bulletins (ENBs). It covers all meetings of the official UNFCCC bodies reported in the ENBs between February 1995 (11th Session of the INC in New York) and December 2013 (COP19 in Warsaw).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery
This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.
The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.
The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.
Dataset References