https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
view code : https://colab.research.google.com/drive/1rLk-mdsWsdxwQdYYJS24rAP9KABtbiqu?usp=sharing
Example :
{"messages": [
{"role": "system", "content": "You are a SQL expert assistant. Generate clear, efficient SQL queries based on user requests. Provide only the SQL query without any additional text or explanation."}
{"role": "user", "content": "What are the top 5 most popular genres of music in the database, based on the number of tracks⊠See the full description on the dataset page: https://huggingface.co/datasets/fknguedia/SQL-GENERATOR-DATASETS.
Introduction This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used. NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device. Datasets The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2). The datasets contain both benign and malicious traffic. All collected datasets are balanced. The version of NetFlow used to build the datasets is 5. Dataset Aim Samples Benign-malicious
traffic ratio D1 Training 400,003 50% D2 Test 57,239 50% Infrastructure and implementation Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows. DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes) Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet). The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities. The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table. Parameters Description '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema' Enumerate users, password hashes, privileges, roles, databases, tables and columns --level=5 Increase the probability of a false positive identification --risk=3 Increase the probability of extracting data --random-agent Select the User-Agent randomly --batch Never ask for user input, use the default behavior --answers="follow=Y" Predefined answers to yes Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer). The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24.
The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases. However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes. To run the MySQL server we ran MariaDB version 10.4.12.
Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global database testing tool market is anticipated to experience substantial growth in the coming years, driven by factors such as the increasing adoption of cloud-based technologies, the rising demand for data quality and accuracy, and the growing complexity of database systems. The market is expected to reach a value of USD 1,542.4 million by 2033, expanding at a CAGR of 7.5% during the forecast period of 2023-2033. Key players in the market include Apache JMeter, DbFit, SQLMap, Mockup Data, SQL Test, NoSQLUnit, Orion, ApexSQL, QuerySurge, DBUnit, DataFactory, DTM Data Generator, Oracle, SeLite, SLOB, and others. The North American region is anticipated to hold a significant share of the database testing tool market, followed by Europe and Asia Pacific. The increasing adoption of cloud-based database testing services, the presence of key market players, and the growing demand for data testing and validation are driving the market growth in North America. Asia Pacific, on the other hand, is expected to experience the highest growth rate due to the rapidly increasing IT spending, the emergence of new technologies, and the growing number of businesses investing in data quality management solutions.
ckanext-sql Due to the absence of a README file in the provided GitHub repository for ckanext-sql, a comprehensive understanding of its features, integration, and benefits is unfortunately not available. Typically, an extension named 'sql' would likely bridge CKAN with SQL databases, potentially enabling users to query and interact with datasets stored in SQL-compatible databases directly from within CKAN. However, lacking specific documentation, definitive claims about its capabilities cannot be accurately made. Potential Key Features (based on the name and typical use cases): * SQL Query Interface: Hypothetically, this extension might offer an interface within CKAN to run SQL queries against linked datasets. * Data Visualization from SQL: Potentially, it could allow generating visualizations directly from data retrieved via SQL queries. * SQL Data Import: It is possible that the extension could provide functionality to import data from SQL databases into CKAN datasets. * Federated Queries: Maybe, the extension implements capability of running federated queries across datasets store as CKAN resources and external databases. * SQL Data Export: Possibility of offering the ability to export CKAN data to a SQL database. * SQL based resource views: Speculatively add different views for resource showing data from SQL Potential Use Cases (based on the name): 1. Direct Data Analysis: Data analysts might use this to directly query and analyze data stored in SQL databases via CKAN, skipping manually importing the data. 2. Database Integration: Organizations that already have large databases of data could use this extension to provide easier access to this data through a CKAN portal. Technical Integration (Hypothetical): Given the name, the 'sql' extension likely integrates with CKAN by adding new API endpoints or UI elements that allow users to specify SQL connections and queries. It would probably require configuration settings to define database connection parameters. It might also integrate with CKAN's resource view system, enabling custom visualizations. Potential Benefits & Impact (Speculative): If the extension functions as expected by the name, it would offer direct access to SQL data within the CKAN environment, reduce the need for data duplication (by querying directly rather than importing), and potentially enhance data analysis and visualization capabilities. The extension could become an indispensable part of data analytic workflows involving CKAN. However, due to a lack of a README.md, this analysis remains at theoretical level.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Image generated by DALL-E. See prompt for more details
synthetic_text_to_sql
gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples, designed and generated using Gretel Navigator, and released under Apache 2.0. Please see our release blogpost for more details. The dataset includes:
105,851 records partitioned into 100,000 train and 5,851 test records ~23M total tokens, including ~12M SQL tokens Coverage across 100 distinct⊠See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_text_to_sql.
https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
According to our latest research, the global Cloud SQL market size in 2024 stands at USD 7.8 billion, reflecting robust adoption across industries. The market is poised for significant expansion, projected to reach USD 32.5 billion by 2033, growing at a compelling CAGR of 17.2% during the forecast period. This remarkable growth is primarily driven by the increasing demand for scalable, flexible, and cost-efficient database management solutions that support digital transformation initiatives worldwide.
A primary growth factor for the Cloud SQL market is the accelerating shift toward cloud-based infrastructure in organizations of all sizes. Enterprises are increasingly migrating their data workloads to the cloud to leverage benefits such as reduced operational costs, enhanced scalability, and improved data accessibility. Cloud SQL solutions, with their managed database services, eliminate the need for manual database maintenance and updates, thereby allowing IT teams to focus on core business activities. Furthermore, the proliferation of data from IoT devices, mobile applications, and digital services is generating an unprecedented amount of structured and unstructured data, necessitating robust database solutions that can seamlessly scale with demand. As organizations prioritize agility and innovation, the adoption of Cloud SQL platforms is becoming integral to their IT strategies.
Another significant driver is the growing emphasis on data security, compliance, and disaster recovery. Cloud SQL services offer advanced security features, including data encryption, automated backups, and multi-region replication, ensuring business continuity and regulatory compliance. The rise in cyber threats and stringent data protection regulations such as GDPR and HIPAA have made secure data management a top priority for enterprises. By leveraging Cloud SQL, organizations can mitigate the risks associated with data breaches and ensure that their critical business information is protected against potential threats. Additionally, the ability to automate backup and recovery processes reduces downtime and safeguards against data loss, further enhancing the value proposition of cloud-based SQL databases.
The integration of advanced analytics and artificial intelligence is also catalyzing the expansion of the Cloud SQL market. Organizations are increasingly harnessing the power of business intelligence and analytics tools to extract actionable insights from their data. Cloud SQL platforms facilitate seamless integration with analytics solutions, enabling real-time data processing and visualization. This capability is particularly valuable for industries such as retail, healthcare, and BFSI, where timely insights can drive better decision-making and competitive advantage. As digital transformation accelerates, the need for agile, intelligent, and data-driven operations will continue to fuel the adoption of Cloud SQL solutions across diverse sectors.
From a regional perspective, North America currently dominates the Cloud SQL market, accounting for the largest share in 2024, driven by the presence of leading cloud service providers, rapid technological advancements, and high digital adoption rates. Europe follows closely, propelled by stringent data privacy regulations and strong demand from sectors such as BFSI and healthcare. The Asia Pacific region is anticipated to witness the fastest growth, with a CAGR exceeding 19%, fueled by increasing cloud adoption among SMEs, government digitalization initiatives, and a burgeoning IT services sector. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, with growing investments in cloud infrastructure and digital transformation projects.
The Cloud SQL market is broadly segmented by database type into relational and non-relational databases. Relational databases, such as MySQL, PostgreSQL, and Microsoft SQL Server, continue to dominate the market due to their widespread use in transactional applications and enterprise workloads. These databases are prized for their ability to maintain data integrity, support complex queries, and provide consistent performance. Businesses in industries like BFSI, healthcare, and retail rely heavily on relational databases for mission-critical applications where data accuracy and reliability are paramount. The demand for managed relational database services in the cloud is further boosted by the need for seamless migration fr
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from different DBMS and provides table names, column⊠See the full description on the dataset page: https://huggingface.co/datasets/b-mc2/sql-create-context.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Online Transaction Processing (OLTP) market is experiencing robust growth, driven by the increasing adoption of cloud-based solutions, the proliferation of mobile and IoT devices generating massive transactional data, and the rising demand for real-time data processing across diverse industries. Let's assume, for illustrative purposes, a 2025 market size of $150 billion, with a Compound Annual Growth Rate (CAGR) of 12% projected for the forecast period of 2025-2033. This signifies a substantial expansion of the market, reaching an estimated value exceeding $400 billion by 2033. Key drivers include the need for enhanced operational efficiency, improved customer experience through faster transaction processing, and the ability to leverage real-time data for informed decision-making. The increasing adoption of advanced technologies like in-memory databases and distributed databases further fuels this growth. Significant trends shaping the OLTP market include the shift towards cloud-based deployment models, owing to their scalability, cost-effectiveness, and ease of management. The growing demand for high-availability and fault-tolerant systems is also pushing innovation in database technologies. The integration of artificial intelligence (AI) and machine learning (ML) for predictive analytics and fraud detection within OLTP systems is another key trend gaining momentum. While the market faces certain restraints like data security concerns, integration complexities, and the need for skilled professionals, the overall growth trajectory remains positive, driven by strong market demand and technological advancements. The segment analysis shows a significant contribution from cloud-based OLTP solutions, with the market being highly competitive, with key players constantly innovating to maintain their market share.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication package for the paper:
Ludovic CourtĂšs, Timothy Sample, Simon Tournier, Stefano Zacchiroli.Source Code Archiving to the Rescue of Reproducible DeploymentACM REP'24, June 18-20, 2024, Rennes, Francehttps://doi.org/10.1145/3641525.3663622
Generating the paper
The paper can be generated using the following command:
guix time-machine -C channels.scm
-- shell -C -m manifest.scm
-- make
This uses GNU Guix to run make in the exact same computational environment used when preparing the paper. The computational environment is described by two files. The channels.scm file specifies the exact version of the Guix package collection to use. The manifest.scm file selects a subset of those packages to include in the environment.
It may be possible to generate the paper without Guix. To do so, you will need the following software (on top of a Unix-like environment):
GNU Make
SQLite 3
GNU AWK
Rubber
Graphviz
TeXLive
Structure
data/ contains the data examined in the paper
scripts/ contains dedicated code for the paper
logs/ contains logs generated during certain computations
Preservation of Guix
Some of the claims in the paper come from analyzing the Preservation of Guix (PoG) database as published on January 26, 2024. This database is the result of years of monitoring the extent to which the source code referenced by Guix packages is archived. This monitoring has been carried out by Timothy Sample who occasionally publishes reports on his personal website: https://ngyro.com/pog-reports/latest/. The database included in this package (data/pog.sql) was downloaded from https://ngyro.com/pog-reports/2024-01-26/pog.db and then exported to SQL format. In addition to the SQL file, the database schema is also included in this package as data/schema.sql.
The database itself is largely the result of scripts, but also of manual adjustments (where necessary or convenient). The scripts are available at https://git.ngyro.com/preservation-of-guix/, which is preserved in the Software Heritage archive as well: https://archive.softwareheritage.org/swh:1:snp:efba3456a4aff0bc25b271e128aa8340ae2bc816;origin=https://git.ngyro.com/preservation-of-guix. These scripts rely on the availability of source code in certain locations on the Internet, and therefore will not yield exactly the same result when run again.
Analysis
Here is an overview of how we use the PoG database in the paper. The exact way it is queried to produce graphs and tables for the paper is laid out in the Makefile.
The pog-types.sql query gives the counts of each source type (e.g. âgitâ or âtar-gzâ) for each commit covered by the database.
The pog-status.sql query gives the archival status of the sources by commit. For each commit, it produces a count of how many sources are stored in the Software Heritage archive, missing from it, or unknown if stored or missing. The pog-status-total.sql query does the same thing but over all sources without sorting them into individual commits.
The disarchive-ratio.sql query estimates the success rate of Disarchive disassembly.
Finally, the swhid-ratio.sql query gives the proportion of sources for which the PoG database has an SWHID.
Estimating missing sources
The Preservation of Guix database only covers sources from a sample of commits to the Guix repository. This greatly simplifies the process of collecting the sources at the risk of missing a few. We estimate how many are missed by searching Guixâs Git history for Nix-style base-32 hashes. The result of this search is compared to the hashes in the PoG database.
A naĂŻve search of Git history results in an over estimate due to Guixâs branch development model. We find hashes that were never exposed to users of âguix pullâ. To work around this, we also approximate the history of commits available to âguix pullâ. We do this by scraping push events from the guix-commits mailing list archives (data/guix-commits.mbox). Unfortunately, those archives are not quite complete. Missing history is reconstructed in the data/missing-links.txt file.
This estimate requires a copy of the Guix Git repository (not included in this package). The repository can be obtained from GNU at https://git.savannah.gnu.org/git/guix.git or from the Software Heritage archive: https://archive.softwareheritage.org/swh:1:snp:9d7b8dcf5625c17e42d51357848baa226b70e4bb;origin=https://git.savannah.gnu.org/git/guix.git. Once obtained, its location must be specified in the Makefile.
To generate the estimate, use:
guix time-machine -C channels.scm
-- shell -C -m manifest.scm
-- make data/missing-sources.txt
If not using Guix, you will need additional software beyond what is used to generate the paper:
GNU Guile
GNU Bash
GNU Mailutils
GNU Parallel
Measuring link rot
In order to measure link rot, we ran Guix Scheme scripts, i.e., scripts that exploit Guix as a Scheme library. The scripts depend on the state of world at the very specific moment when they ran. Hence, it is not possible to reproduce the exact same outputs. However, their tendency over the passing of time should be very similar. For running them, you need an installation of Guix. For instance,
guix repl -q scripts/table-per-origin.scm
When running these scripts for the paper, we tracked their output and saved it inside the logs directory.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This is a set of SQL databases with information about molecules and radicals with the following database conventions and content.
Each database is named as âCnHmâ, where n and m range from 1 to 5 and indicate the number of carbon (C) and hydrogen (H) atoms in the structures in the database.
Each database contains entries of a large number of âCnHmâ geometries. Within each database are frous tables.
Table âmetaâContains the name of the database and the date it was created.Table âxyzâThe columns of the âxyzâ database are the following:
âidâ: a numerical identification number, integer
âcalc_paramsâ: metadata describing the level of theory and other details of the quantum chemical calculations to generate the Hessian used for generating this structure, numpy array stored as a blob.
âcalcâ: software used to for the calculation, string
âtempâ: the temperature used during the normal mode sampling process to generate the structure in K, outliers are assigned negative temperature, float
ânameâ: a unique name describing the anchor point the structure was generated from, string
âdistâ: the normalized unitless distance of the structure from its anchor point at the temperature given in âtempâ, float
âgeometryâ: atomic coordinates in angstroms, (n+m) by 3 numpy array of floats stored as a blob. Note that the atomic positions are listed with carbons first followed by hydrogens.
âcreated_atâ: date the structure was generated
Table âenergyâThe columns of the âenergyâ database are the following:
âidâ: a numerical identification number, not linked to the âidâ in âxyzâ, integer
âfidelityâ: the fidelity level the energy was calculated at, integer
0 = B3LYP/6-31+G(d)
1 = wB97X-D/6-311++G(d,p)
2 = HF/6-31G
3 = B3LYP/6-31G
4 = B2PLYPD3/6-311++G(d,p)
âEâ: molecular energy in eV, float
âxyz_idâ: the âidâ of the geometry in the âxyzâ table this energy calculated for, integer
âhessianâ: empty
âforcesâ: atomic forces in eV/angstrom, (n+m) by 3 numpy array of floats stored as a blob. Note that the atomic forces are listed in the same order as the atoms in the âgeometryâ in the âxyzâ table
âcalc_paramsâ: metadata describing the level of theory and other details of the energy and force calculations of this entry, numpy array stored as a blob
âcalcâ: software used to for the energy and force calculation
âcreated_atâ: date the energy and forces were calculated
âsample_set_idâ: empty
Table âaevâ: currently empty (edited)
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The NoSQL database market is experiencing robust growth, driven by the increasing demand for scalable, flexible, and high-performance data solutions to manage the explosion of unstructured and semi-structured data. The market, currently estimated at $50 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching approximately $250 billion by 2033. This expansion is fueled by several key factors. The adoption of cloud computing and microservices architectures is significantly contributing to the demand for NoSQL databases, which offer superior scalability and agility compared to traditional relational databases. Furthermore, the rise of big data analytics and the Internet of Things (IoT) are generating massive volumes of data, necessitating databases capable of handling such scale and variety. The diverse application segments, including enterprise applications, government initiatives, and others, are further propelling market growth. Key players like Microsoft, IBM, Oracle, Amazon, and Google are heavily investing in developing and enhancing their NoSQL database offerings, intensifying competition and fostering innovation. The market segmentation reveals strong growth potential in both application and database type. New architectures, offering greater flexibility and scalability, are leading the type segment, while the enterprise sector dominates in terms of applications, followed by the government sector. However, the "Others" category demonstrates substantial potential for growth as NoSQL databases find wider adoption across various industry verticals. Geographical distribution shows a concentration in North America and Europe, reflecting early adoption in mature markets. However, significant growth opportunities exist in Asia Pacific, particularly in China and India, where digital transformation and technological advancements are accelerating. While competition is intense, the market's large size and potential for continued expansion indicate ample opportunities for both established players and emerging niche providers. The restraints facing the market are mainly associated with the complexity of NoSQL database management, the need for specialized expertise, and the potential for data security concerns. However, these challenges are gradually being addressed through advancements in database technology and management tools.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Online Transaction Processing (OLTP) market is experiencing robust growth, driven by the increasing adoption of cloud-based solutions and the expanding need for real-time data processing across diverse industries. The market's value is estimated to be $50 billion in 2025, exhibiting a Compound Annual Growth Rate (CAGR) of 15% from 2019 to 2024. This growth is fueled by several key factors. Firstly, the proliferation of mobile and internet-connected devices is generating massive volumes of transactional data, demanding efficient and scalable OLTP systems. Secondly, the rise of e-commerce, digital payments, and real-time applications across sectors like finance, healthcare, and logistics necessitates high-performance OLTP solutions capable of handling millions of transactions per second. Finally, the shift towards cloud computing provides organizations with flexible, cost-effective, and scalable OLTP infrastructure, further boosting market expansion. Major players like Oracle, IBM (DB2), and Amazon (Aurora) are fiercely competing in this space, continually innovating to offer enhanced performance, security, and scalability. However, the market also faces certain challenges. The complexity of implementing and managing OLTP systems, coupled with the need for specialized skills, can pose significant hurdles for smaller organizations. Data security and privacy concerns remain paramount, requiring robust security measures to protect sensitive transactional data. Furthermore, the increasing demand for integration with diverse data sources and applications necessitates interoperability and seamless data exchange capabilities. Despite these challenges, the long-term outlook for the OLTP market remains positive, with projected growth driven by ongoing digital transformation initiatives across various industries and the continued development of advanced technologies like in-memory computing and distributed databases. The competitive landscape is dynamic, with established players and emerging startups vying for market share through innovation and strategic partnerships.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a diverse collection of pre-processed flow cytometry data assembled
to support the training and evaluation of machine learning (ML) models for the gating of
cell populations. The data was curated through a citizen science initiative embedded in
the EVE Online video game, known as Project Discovery. Participants contributed to
scientific research by gating bivariate plots generated from flow cytometry data, creating
a crowdsourced reference set. The original flow cytometry datasets were sourced from
publicly available COVID-19 and immunology-related studies on FlowRepository.org and
PubMed. Data were compensated, transformed, and split into bivariate plots for analysis.
This datset includes: 1) CSV files containing two-channel marker combinations per plot, 2)
A SQL database capturing player-generated gating polygons in normalized coordinates, 3)
Scripts and containerized environments (Singularity and Docker) for reproducible
evaluation of gating accuracy and consensus scoring using the flowMagic
pipeline, 4)
Code for filtering bot inputs, evaluating user submissions, calculating F1 scores, and
generating consensus gating regions. This data is especially valuable for training and
benchmarking models that aim to automate the labor-intensive gating process in
immunological and clinical cytometry applications.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Big Data Analytics market within the Defense and Aerospace sectors is experiencing robust growth, driven by the increasing need for advanced intelligence gathering, predictive maintenance, and improved operational efficiency. The market, estimated at $15 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033, reaching approximately $45 billion by 2033. Key drivers include the proliferation of sensor data, the rise of unmanned aerial vehicles (UAVs) generating massive datasets, and the demand for real-time situational awareness. Furthermore, the adoption of cloud-based solutions and the development of sophisticated AI/ML algorithms for data analysis are significantly accelerating market expansion. The defense segment currently holds a larger market share compared to aerospace, but both segments are experiencing parallel growth. Within the application segments, predictive maintenance and intelligence analysis are high-growth areas, while within the technology segments, cloud-based solutions and NoSQL databases are experiencing strong adoption. The North American region currently dominates the market due to significant investments in defense and aerospace technologies, followed by Europe and Asia-Pacific. Despite the promising outlook, several restraints exist, including data security concerns, the high cost of implementation and maintenance of big data analytics solutions, and the need for skilled professionals capable of managing and interpreting complex data sets. However, these challenges are being actively addressed through the development of robust cybersecurity protocols, cloud-based cost optimization strategies, and increased investment in training and education programs. The competitive landscape is highly fragmented, with a mix of established technology giants like IBM, Microsoft, and Google, alongside specialized defense contractors and emerging analytics companies. Strategic partnerships and acquisitions are expected to further shape the market dynamics in the coming years, driving innovation and expanding market reach.
Scripts for AnalysesThis file contains the instructions for generating the appropriate mySQL databases and the scripts required to process this data and recreate the analyses described in Evans et al.Scripts.tar.gzAlignment FilesThis file contains the folder system storing the alignments (.nex) and Maximum-Likelihood tree files (_tree.tre) for the sequences presented in Evans et al.
Each file is named according to the seq_number information as in the database basic_db.sql (also contained within this DataDryad submission).
Each alignment contains only the sequence from the first two codon positions as reported in Evans et. al.Alignment_files.tar.gzmySQL Database including dataThis file contains a mySQL database including all the data used in Evans et al. It is in the same format as the database which can be created using the files available in the Scripts package associated with this datadryad package.basic_db.sql
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data is sourced from CSIRO Parkes ATNF.eg http://www.atnf.csiro.au/research/pulsar/psrcat/Feel the pulse of the universeWe're taking signal data from astronomical "pulsar" sources and creating a way to listen to their signals audibly.Pulsar data is available from ATNF at CSIRO.au. Our team at #SciHackMelb has been working on a #datavis to give researchers and others a novel way to explore the Pulsar corpus, especially through the sound of the frequencies at which the Pulsars emit pulses.Link to project page at #SciHackMelb - http://www.the-hackfest.com/events/melbourne-science-hackfest/projects/pulsar-voices/The files attached here include: source data, project presentation, data as used in website final_pulsar.sql, and other methodology documentation. Importantly, see the Github link which contains data manipulation code, html code to present the data, and render audibly, iPython Notebook to process single pulsar data into an audible waveform file. Together all these resources are the Pulsar Voices activity and resulting data.Source Data;* RA - east/west coordinates (0 - 24 hrs, roughly equates to longitude) [theta; transforms RA to 0 - 360*]* Dec - north/south coordinates (-90, +90 roughly equates to latitude i.e. 90 is above north pole, and -90 south pole)* P0 - the time in seconds that a pulsar repeats its signal* f - 1/P0 which ranges from 700 cycles per sec, to some which pulses which occur every few seconds* kps - distance from Earth in kilo-parsecs. 1 kps = 3,000 light years. The furthest data is 30 kps. The galactic centre is about 25,000 light years away i.e. about 8kps.psrcatShort.csv = 2,295 Pulsars all known pulsars with above fields; RA, Dec, ThetapsrcatMedium.csv - add P0 and kps, only 1428 lines - i.e. not available for all 2,295 datapointpsrcatSparse.csv - add P0 and kps, banks if n/a, 2,295 linesshort.txt - important pulsars with high levels of observation (** even more closely examined)pulsar.R - code contributed by Ben Raymond to visualise Pulsar frequency, period in histogrampulsarVoices_authors.JPG - added photo of authors from SciHackMelbAdded to the raw data:- Coordinates to map RA, Dec to screen width(y)/height(x)y = RA[Theta]*width/360; x = (Dec + 90)*height/180- audible frequency converted from Pulsar frequency (1/P0)Formula for 1/P0(x) -> Hz(y) => y = 10 ^ (0.5 log(x) + 2.8)Explanation in text file; Convert1/P0toHz.txtTone generator from: http://www.softsynth.com/webaudio/tone.php- detailed waveform file audible converted from Pulsar signal data, and waveform image (and python notebook to generate; available):The project source is hosted on github at:https://github.com/gazzar/pulsarvoicesAn IPython/Jupyter notebook contains code and a rough description of the method used to process a psrfits .sf filedownloaded via the CSIRO Data Access Portal at http://doi.org/10.4225/08/55940087706E1The notebook contains experimental code to read one of these .sf files and access the contained spectrogram data, processing it to generate an audible signal.It also reads the .txt files containing columnar pulse phase data (which is also contained in the .sf files) and processes these by frequency modulating the signal with an audible carrier.This is the method used to generate the .wav and .png files used in the web interface.https://github.com/gazzar/pulsarvoices/blob/master/ipynb/hackfest1.ipynb A standalone python script that does the .txt to .png and .wav signal processing was used to process 15 more pulsar data examples. These can be reproduced by running the script.https://github.com/gazzar/pulsarvoices/blob/master/data/pulsarvoices.pyProcessed file at: https://github.com/gazzar/pulsarvoices/tree/master/webhttps://github.com/gazzar/pulsarvoices/blob/master/web/J0437-4715.pngJ0437-4715.wav | J0437-4715.png)#Datavis online at: http://checkonline.com.au/tooltip.php. Code at Github linked above. See especially:https://github.com/gazzar/pulsarvoices/blob/master/web/index.phpparticularly, lines 314 - 328 (or search: "SELECT * FROM final_pulsar";) which loads pulsar data from DB and push to screen with Hz on mouseover.Pulsar Voices webpage Functions:1.There is sound when you run the mouse across the Pulsars. We plot all known pulsars (N=2,295), and play a tone for pulsars we had data on frequency i.e. about 75%.2. In the bottom left corner a more detailed Pulsar sound, and wave image pops up when you click the star icon. Two of the team worked exclusively on turning a single pulsars waveform into an audible wav file. They created 16 of these files, and a workflow, but the team only had time to load one waveform. With more time, it would be great to load these files.3. If you leave the mouse over a Pulsar, a little data description pops up, with location (RA, Dec), distance (kilo parsecs; 1 = 3,000 light years), and frequency of rotation (and Hz converted to human hearing).4.If you click on a Pulsar, other pulsars with similar frequency are highlighted in white. With more time I was interested to see if there are harmonics between pulsars. i.e. related frequencies.The TeamMichael Walker is: orcid.org/0000-0003-3086-6094 ; Biosciences PhD student, Unimelb, Melbourne.Richard Ferrers is: orcid.org/0000-0002-2923-9889 ; ANDS Research Data Analyst, Innovation/Value Researcher, Melbourne.Sarath Tomy is: http://orcid.org/0000-0003-4301-0690 ; La Trobe PhD Comp Sci, Melbourne.Gary Ruben is: http://orcid.org/0000-0002-6591-1820 ; CSIRO Postdoc at Australian Synchrotron, Melbourne.Christopher Russell is: Data Manager, CSIRO, Sydney.https://wiki.csiro.au/display/ASC/Chris+RussellAnderson Murray is: orcid.org/0000-0001-6986-9140; Physics Honours, Monash, Melbourne.Contact: richard.ferrers@ands.org.au for more information.What is still left to do?* load data, description, images fileset to figshare :: DOI ; DONE except DOI* add overview images as option eg frequency bi-modal histogram* colour code pulsars by distance; DONE* add pulsar detail sound to Top three Observants; 16 pulsars processed but not loaded* add tones to pulsars to indicate f; DONE* add tooltips to show location, distance, frequency, name; DONE* add title and description; DONE* project data onto a planetarium dome with interaction to play pulsar frequencies.DONE see youtube video at https://youtu.be/F119gqOKJ1U* zoom into parts of sky to get separation between close data points - see youtube; function in Google Earth #datavis of dataset. Link at youtube.* set upper and lower tone boundaries, so tones aren't annoying* colour code pulsars by frequency bins e.g. >100 Hz, 10 - 100, 1 - 10,
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The In-Memory Data Grid (IMDG) market is experiencing robust growth, projected to reach $3.80 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 19.23% from 2025 to 2033. This expansion is fueled by the increasing demand for real-time data processing and analytics across diverse sectors. The rise of big data, the proliferation of Internet of Things (IoT) devices generating massive data streams, and the need for faster, more efficient applications are key drivers. Cloud deployment models are gaining traction, offering scalability and cost-effectiveness compared to on-premise solutions. The BFSI (Banking, Financial Services, and Insurance), IT and Telecommunications, and Retail sectors are significant adopters, leveraging IMDGs for high-frequency trading, fraud detection, personalized customer experiences, and supply chain optimization. However, factors such as the complexity of implementation and the need for specialized expertise can restrain wider adoption. The market is segmented by component (solutions, services), deployment type (on-premise, cloud), and end-user industry, presenting opportunities for vendors specializing in specific niches. Competition is intense, with established players like IBM, Oracle, and TIBCO alongside agile companies like GigaSpaces and Hazelcast vying for market share. Future growth will likely be driven by advancements in technology, such as improved data security and integration with emerging technologies like AI and machine learning. The competitive landscape is dynamic, with both established players and innovative startups vying for market dominance. Strategic partnerships, acquisitions, and technological innovations are shaping the competitive dynamics. The continuous evolution of cloud computing and the increasing adoption of microservices architecture further fuel the demand for IMDGs. The market's future trajectory will likely be defined by the ability of vendors to deliver scalable, secure, and easy-to-integrate solutions tailored to the specific needs of different industries and deployment environments. Geographical growth will vary, with North America and Europe expected to maintain strong market share due to early adoption, while the Asia-Pacific region is projected to experience significant growth due to increasing digitalization and technological advancements. This report provides a comprehensive analysis of the In-Memory Data Grid (IMDG) industry, offering a detailed examination of market trends, key players, and future growth prospects. The study covers the period from 2019 to 2033, with 2025 serving as the base and estimated year. It projects robust growth, driven by the increasing demand for real-time data processing and analytics across various sectors. This report is essential for businesses seeking to understand and capitalize on the opportunities within this rapidly evolving market. Recent developments include: May 2022: Intesa Sanpaolo, one of the biggest banks in Italy, uses Optane DIMMs and in-memory software for its servers and makes applications run faster. With this, the bank is able to recover a database instance from storage drives in approximately two seconds with software-defined memory-to-memory services., March 2022: Hazelcast enhanced its in-memory data grid software with more SQL streaming data capabilities and tiering so that real-time and older information can be queried concurrently.. Key drivers for this market are: Increasing Need for Attaining Unprecedented Levels of Speed at Data Processing, Growth of Big Data. Potential restraints include: High Costs and Operational Concerns, Concerns related to Geoprivacy and Confidential Data. Notable trends are: Growing Need for Real Time Data Processing in BFSI Driving the Market Growth.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Clinton/Text-to-sql-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
THIS IS STILL WIP, PLEASE DO NOT CIRCULATE
About This dataset contains counts of (referer, article) pairs extracted from the request logs of English Wikipedia. When a client requests a resource by following a link or performing a search, the URI of the webpage that linked to the resource is included in the request in an HTTP header called the "referer". This data captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015. Data Preparation- The dataset only includes requests to articles in the main namespace of the desktop version of English Wikipedia (see https://en.wikipedia.org/wiki/Wikipedia:Namespace) - Requests to MediaWiki redirects are excluded - Spider traffic was excluded using the ua-parser library (https://github.com/tobie/ua-parser) - Referers were mapped to a fixed set of values corresponding to internal traffic or external traffic from one of the top 5 global traffic sources of English Wikipedia, based on this scheme: - an article in the main namespace of English Wikipedia -> the article title - any Wikipedia page that is not in the main namespace of English Wikipedia -> 'other-wikipedia' - an empty referer -> 'other-empty' - a page from any other Wikimedia project -> 'other-internal' - Google -> 'other-google' - Yahoo -> 'other-yahoo' - Bing -> 'other-bing' - Facebook -> 'other-facebook' - Twitter -> 'other-twitter' - anything else -> 'other' For the exact mapping see https://github.com/ewulczyn/wmf/blob/master/mc/oozie/hive_query.sql#L30-L48 - (referer, article) pairs with 10 or fewer observations were removed from the dataset Note: When a user requests a page through the search bar, the page the user searched from is listed as a referer. Hence, the data contains '(referer, article)' pairs for which the referer does not contain a link to the article. For an example, consider the '(Wikipedia, Chris_Kyle)' pair. Users went to the 'Wikipedia' article to search for Chris Kyle within English Wikipedia. ApplicationsThis data can be used for various purposes: - determining the most frequent links people click on for a given article- determining the most common links people followed to an article- determining how much of the total traffic to an article clicked on a link in that article- generating a Markov chain over English Wikipedia Format:- prev_id: if the referer does not correspond to an article in the main namespace of English Wikipedia, this value will be empty. Otherwise, it contains the unique MediaWiki page ID of the article corresponding to the referer i.e. the previous article the client was on- curr_id: the MediaWiki unique page ID of the article the client requested- n: the number of occurrences of the '(referer, article)' pair- prev_title: the result of mapping the referer URL to the fixed set of values described above- curr_title: the title of the article the client requested
LicenseAll files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/ Source codehttps://github.com/ewulczyn/wmf/blob/master/mc/oozie/hive_query.sql (MIT license)
As of July 2024, more than ********* of India's generative artificial intelligence (GenAI) startups were in the code and data segment. It was followed by audio and video segment startups at ** percent. The code and data GenAI provides features like generating code, and documents, as well as converting text to SQL.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
view code : https://colab.research.google.com/drive/1rLk-mdsWsdxwQdYYJS24rAP9KABtbiqu?usp=sharing
Example :
{"messages": [
{"role": "system", "content": "You are a SQL expert assistant. Generate clear, efficient SQL queries based on user requests. Provide only the SQL query without any additional text or explanation."}
{"role": "user", "content": "What are the top 5 most popular genres of music in the database, based on the number of tracks⊠See the full description on the dataset page: https://huggingface.co/datasets/fknguedia/SQL-GENERATOR-DATASETS.