Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I've always wanted to explore Kaggle's Meta Kaggle dataset but I am more comfortable on using TSQL when it comes to writing (very) complex queries. Also, I tend to write queries faster when using SQL MANAGEMENT STUDIO, like 100x faster. So, I ported Kaggle's Meta Kaggle dataset into MS SQL SERVER 2022 database format, created a backup file, then uploaded it here.
Explore Kaggle's public data on competitions, datasets, kernels (code/ notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but they think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F2ad97bce7839d6e57674e7a82981ed23%2F2Egeb8R.png?generation=1688912953875842&alt=media" alt="">
Facebook
TwitterFinancial overview and grant giving statistics of Jacksonville Sql Server Users Group Inc.
Facebook
TwitterThese files contain SET monitoring data collected at Assateague Island National Seashore
Facebook
TwitterFinancial overview and grant giving statistics of Capital Area Sql Server Group
Facebook
TwitterThese files contain SET monitoring data collected at Colonial National Historical Site
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Publicly accessible databases often impose query limits or require registration. Even when I maintain public and limit-free APIs, I never wanted to host a public database because I tend to think that the connection strings are a problem for the user.
I’ve decided to host different light/medium size by using PostgreSQL, MySQL and SQL Server backends (in strict descending order of preference!).
Why 3 database backends? I think there are a ton of small edge cases when moving between DB back ends and so testing lots with live databases is quite valuable. With this resource you can benchmark speed, compression, and DDL types.
Please send me a tweet if you need the connection strings for your lectures or workshops. My Twitter username is @pachamaltese. See the SQL dumps on each section to have the data locally.
Facebook
TwitterAs of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Facebook
TwitterAs of December 2022, relational database management systems (RDBMS) were the most popular type of DBMS, accounting for a ** percent popularity share. The most popular RDBMS in the world has been reported as Oracle, while MySQL and Microsoft SQL server rounded out the top three.
Facebook
Twittersqlserver-querystore-timeseries
Facebook
TwitterFinancial overview and grant giving statistics of Lincoln Sql Server User Group Inc.
Facebook
TwitterThese files contain SET monitoring data collected at Cape Cod National Seashore
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.
NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.
Datasets
The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).
The datasets contain both benign and malicious traffic. All collected datasets are balanced.
The version of NetFlow used to build the datasets is 5.
Dataset
Aim
Samples
Benign-malicious
traffic ratio
D1
Training
400,003
50%
D2
Test
57,239
50%
Infrastructure and implementation
Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.
DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)
Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).
The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.
The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.
Parameters
Description
'--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
Enumerate users, password hashes, privileges, roles, databases, tables and columns
--level=5
Increase the probability of a false positive identification
--risk=3
Increase the probability of extracting data
--random-agent
Select the User-Agent randomly
--batch
Never ask for user input, use the default behavior
--answers="follow=Y"
Predefined answers to yes
Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).
The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.
However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.
To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Structured Query Language (SQL) server transformation market is experiencing robust growth, projected to reach $15 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 9.4% from 2025 to 2033. This expansion is fueled by several key drivers. The increasing adoption of cloud-based solutions and the rise of big data analytics are pushing organizations to adopt more efficient and scalable SQL server solutions. Furthermore, the growing demand for real-time data processing and improved data integration capabilities within large enterprises and SMEs is significantly driving market growth. The market segmentation reveals strong demand across various application areas, with large enterprises leading the way due to their greater need for robust and scalable data management infrastructure. Data integration scripts remain a prominent segment, highlighting the critical need for seamless data flow across diverse systems. The competitive landscape is marked by established players like Oracle, IBM, and Microsoft, alongside emerging innovative companies specializing in cloud-based SQL server technologies. Geographic analysis suggests North America and Europe currently hold the largest market share, but significant growth potential exists in the Asia-Pacific region, driven by rapid digital transformation and economic growth in countries like India and China. The restraints on market growth are primarily related to the complexities involved in migrating existing legacy systems to new SQL server solutions, along with the need for skilled professionals to manage and optimize these systems. However, the ongoing advancements in automation tools and the increased availability of training programs are mitigating these challenges. The future trajectory of the market indicates continued growth, driven by emerging technologies such as AI-powered query optimization, enhanced security features, and the growing adoption of serverless architectures. This will lead to a wider adoption of SQL server transformation across various sectors, including finance, healthcare, and retail, as organizations seek to leverage data to gain competitive advantage and improve operational efficiency. The market is ripe for innovation and consolidation, with opportunities for both established players and new entrants to capitalize on this ongoing transformation.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global database market is booming, projected to reach [estimated 2033 market size in billions] by 2033, growing at a CAGR of 14.21%. This report analyzes market drivers, trends, restraints, and key players like MongoDB, Amazon, and Microsoft across cloud, on-premises, and various industry verticals. Discover insights into market segmentation and regional growth. Recent developments include: January 2024: Microsoft and Oracle recently announced the general availability of Oracle Database@Azure, allowing Azure customers to procure, deploy, and use Oracle Database@Azure with the Azure portal and APIs.November 2023: VMware, Inc. and Google Cloud announced an expanded partnership to deliver Google Cloud’s AlloyDB Omni database on VMware Cloud Foundation, starting with on-premises private clouds.. Key drivers for this market are: Increasing Penetration Of Trends Like Big Data And IoT, Increase In The Volume Of Data Generated And Shift Of Enterprise Operations. Potential restraints include: Increasing Penetration Of Trends Like Big Data And IoT, Increase In The Volume Of Data Generated And Shift Of Enterprise Operations. Notable trends are: Retail and E-commerce to Hold Significant Share.
Facebook
TwitterThe global database management system (DBMS) market revenue grew to ** billion U.S. dollars in 2020. Cloud DBMS accounted for the majority of the overall market growth, as database systems are migrating to cloud platforms. Database market The database market consists of paid database software such as Oracle and Microsoft SQL Server, as well as free, open-source software options like PostgreSQL and MongolDB. Database Management Systems (DBMSs) provide a platform through which developers can organize, update, and control large databases, with products like Oracle, MySQL, and Microsoft SQL Server being the most widely used in the market. Database management software Knowledge of the programming languages related to these databases is becoming an increasingly important asset for software developers around the world, and database management skills such as MongoDB and Elasticsearch are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is SQL Server advanced data types : JSON, XML, and beyond. It features 7 columns including author, publication date, language, and book publisher.
Facebook
Twitterhttps://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The SQL Server Transformation market is rapidly evolving, driven by the increasing need for organizations to harness data effectively for decision-making and operational efficiency. This market encompasses various processes and technologies that facilitate the migration, integration, and transformation of data withi
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset created as part of the Master Thesis "Business Intelligence – Automation of Data Marts modeling and its data processing".
Lucerne University of Applied Sciences and Arts
Master of Science in Applied Information and Data Science (MScIDS)
Autumn Semester 2022
Change log Version 1.1:
The following SQL scripts were added:
Index
Type
Name
1
View
pg.dictionary_table
2
View
pg.dictionary_column
3
View
pg.dictionary_relation
4
View
pg.accesslayer_table
5
View
pg.accesslayer_column
6
View
pg.accesslayer_relation
7
View
pg.accesslayer_fact_candidate
8
Stored Procedure
pg.get_fact_candidate
9
Stored Procedure
pg.get_dimension_candidate
10
Stored Procedure
pg.get_columns
Scripts are based on Microsoft SQL Server Version 2017 and compatible with a data warehouse built with Datavault Builder. Data warehouse objects scripts of the sample data warehouse are restricted and cannot be shared.
Facebook
TwitterThe State Contract and Procurement Registration System (SCPRS) was established in 2003, as a centralized database of information on State contracts and purchases over $5000. eSCPRS represents the data captured in the State's eProcurement (eP) system, Bidsync, as of March 16, 2009. The data provided is an extract from that system for fiscal years 2012-2013, 2013-2014, and 2014-2015
Data Limitations:
Some purchase orders have multiple UNSPSC numbers, however only first was used to identify the purchase order. Multiple UNSPSC numbers were included to provide additional data for a DGS special event however this affects the formatting of the file. The source system Bidsync is being deprecated and these issues will be resolved in the future as state systems transition to Fi$cal.
Data Collection Methodology:
The data collection process starts with a data file from eSCPRS that is scrubbed and standardized prior to being uploaded into a SQL Server database. There are four primary tables. The Supplier, Department and United Nations Standard Products and Services Code (UNSPSC) tables are reference tables. The Supplier and Department tables are updated and mapped to the appropriate numbering schema and naming conventions. The UNSPSC table is used to categorize line item information and requires no further manipulation. The Purchase Order table contains raw data that requires conversion to the correct data format and mapping to the corresponding data fields. A stacking method is applied to the table to eliminate blanks where needed. Extraneous characters are removed from fields. The four tables are joined together and queries are executed to update the final Purchase Order Dataset table. Once the scrubbing and standardization process is complete the data is then uploaded into the SQL Server database.
Secondary/Related Resources:
Facebook
TwitterThese files contain SET monitoring data collected at Cape Cod National Seashore
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I've always wanted to explore Kaggle's Meta Kaggle dataset but I am more comfortable on using TSQL when it comes to writing (very) complex queries. Also, I tend to write queries faster when using SQL MANAGEMENT STUDIO, like 100x faster. So, I ported Kaggle's Meta Kaggle dataset into MS SQL SERVER 2022 database format, created a backup file, then uploaded it here.
Explore Kaggle's public data on competitions, datasets, kernels (code/ notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but they think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F2ad97bce7839d6e57674e7a82981ed23%2F2Egeb8R.png?generation=1688912953875842&alt=media" alt="">