The lineitem table of the TPC-H benchmark's SF100 data set. Stored in Parquet files. Website: https://www.tpc.org/tpch/ Related publications: - TPC Benchmark H, Standard Specification Revision 2.17.2, 2017. https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdf - Meikel Poess: TPC-H. Encyclopedia of Big Data Technologies, 2019. https://link.springer.com/referenceworkentry/10.1007/978-3-319-63962-8_126-1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TPC-H data (scale factor 10), Customers joined with Orders joined with Lineitems, translated into JSON with the following structure:
Array of Customer with nested `c_orders` which is an array of Orders, each order with nested `o_lineitems` which is an array of Lineitems.
polars-tpch
This repo contains the code used for performance evaluation of polars. The benchmarks are TPC-standardised queries and data designed to test the performance of "real" workflows. From the TPC website:
TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision… See the full description on the dataset page: https://huggingface.co/datasets/kunishou/tpch_tables_scale_1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TPC-H-scalar and Microbenchmark queries with scalar functions
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
To evaluate the performance of database applications and database management systems (DBMSs), we usually execute workloads of queries on generated databases of different sizes and then benchmark various measures such as respond time and throughput. This paper introduces MyBenchmark, a parallel data generation tool that takes a set of queries as input and generates database instances. Users of MyBenchmark can control the characteristics of the generated data as well as the characteristics of the resultingworkload. Applications of MyBenchmark includeDBMS testing, database application testing, and application-driven benchmarking. In this paper, we present the architecture and the implementation algorithms of MyBenchmark. Experimental results show that MyBenchmark is able to generate workload-aware databases for a variety of workloads including query workloads extracted from TPC-C, TPC-E, TPC-H, and TPC-W benchmarks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The title of our paper submitted to CC20 is
Improving Database Query Performance with Automatic Fusion
This repository is created for showing the reproducibility of our experiments in this paper. We provide the details of scripts and original data used in the experiments. There are mainly two systems: HorsePower and RDBMS MonetDB. We supply step-by-step instructions to configure and deploy both systems in the experiments.
On this page, you will see:
how to run experiments (Section 2); and
the results used in the paper (Section 3);
All experiments were run on a server called sable-intel equipped with
Ubuntu 16.04.6 LTS (64-bit)
4 Intel Xeon E7-4850 2.00 GHz
total 40 cores with 80 threads
128GB RAM
Docker setup
Download the docker image: cc20-docker.tar (About 13GB)
docker load < cc20-docker.tar
Generate a named container (then exit)
docker run --hostname sableintel -it --name=container-cc20 wukefe/cc20-docker exit
Then, you can run the container
docker start -ai container-cc20
Open a new terminal to access the container (optional)
docker exec -it container-cc20 /bin/bash
Introduction to MonetDB
Work directory for MonetDB
/home/hanfeng/cc20/monetdb
Start MonetDB (use all available threads)
./run.sh start
Login MonetDB using its client tool, mclient
mclient -d tpch1
sql> SELECT 'Hello world'; +-------------+ | L2 | +=============+ | Hello world | +-------------+ 1 tuple
Show the list of tables in the current database
sql> \d TABLE sys.customer TABLE sys.lineitem TABLE sys.nation TABLE sys.orders TABLE sys.part TABLE sys.partsupp TABLE sys.region TABLE sys.supplier
Leave the session
sql> \q
Stop MonetDB before we can continue our experiments
./run.sh stop
Reference: How to install MonetDB and the introduction of server and client programs.
Run MonetDB with TPC-H queries
MonetDB: server mode
Invoke MonetDB with a specific number of threads (e.g. 1)
mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=1
Open a new terminal
docker exec -it container-cc20 /bin/bash cd cc20/monetdb
Note: Type \q to exit the server mode.
Run with a specific number of threads (Two terminals required)
1 thread
mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=1
(time ./runtest | mclient -d tpch1) &> "log/log_thread_1.log"
2 threads
mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=2
(time ./runtest | mclient -d tpch1) &> "log/log_thread_2.log"
4 threads
mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=4
(time ./runtest | mclient -d tpch1) &> "log/log_thread_4.log"
8 threads
mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=8
(time ./runtest | mclient -d tpch1) &> "log/log_thread_8.log"
16 threads
mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=16
(time ./runtest | mclient -d tpch1) &> "log/log_thread_16.log"
32 threads
mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=32
(time ./runtest | mclient -d tpch1) &> "log/log_thread_32.log"
64 threads
mserver5 --set embedded_py=true --dbpath=/home/hanfeng/datafarm/2019/tpch1 --set monet_vault_key=/home/hanfeng/datafarm/2019/tpch1/.vaultkey --set gdk_nr_threads=64
(time ./runtest | mclient -d tpch1) &> "log/log_thread_64.log"
Post data processing - MonetDB
Fetch average execution time (ms)
grep -A 3 avg_query log/log_thread_1.log | python cut.py
699.834133333 // q1 85.9178666667 // q4 65.0172 // q6 101.730666667 // q12 58.212 // q14 60.1138666667 // q16 248.926466667 // q19 77.6482 // q22
grep -A 3 avg_query log/log_thread_2.log | python cut.py grep -A 3 avg_query log/log_thread_4.log | python cut.py grep -A 3 avg_query log/log_thread_8.log | python cut.py grep -A 3 avg_query log/log_thread_16.log | python cut.py grep -A 3 avg_query log/log_thread_32.log | python cut.py grep -A 3 avg_query log/log_thread_64.log | python cut.py
Note: The above numbers can be copied to an Excel file for further analysis before plotting figures. Details can be found in Section 3.
Run with HorseIR
The HorsePower project can be found on GitHub. In the docker image, it has been placed in /home/hanfeng/cc20/horse.
https://github.com/Sable/HorsePower
Execution time
We then run each query 15 times to get the average execution time (ms).
(cd /home/hanfeng/cc20/horse/ && time ./run_all.sh)
The script run_all.sh runs over three versions of generated C code based on different levels of optimizations.
In each version, it first compiles its C code and runs the generated binary with a different number of threads (i.e. 1/2/4/8/16/32/64). Each run computes a query 15 times and returns the average.
As a result, all output is saved into a log file, for example, log/naive/log_q6.log contains the result of query 6 in the naive version with all different number of threads.
Log file structures
log/naive/*.txt log/opt1/*.txt log/opt2/*.txt
Fetch a brief summary of execution time from a log file
cat log/naive/log_q6.txt | grep -E 'Run with 15 times'
q06>> Run with 15 times, last 15 average (ms): 266.638 | 278.999 266.134 266.417 <12 more> # 1 thread q06>> Run with 15 times, last 15 average (ms): 138.556 | 144.474 137.837 137.579 <12 more> # 2 threads q06>> Run with 15 times, last 15 average (ms): 71.8851 | 75.339 72.102 72.341 <12 more> # 4 threads q06>> Run with 15 times, last 15 average (ms): 73.111 | 75.867 72.53 72.936 <12 more> # 8 threads q06>> Run with 15 times, last 15 average (ms): 56.1003 | 59.263 56.057 56.039 <12 more> # 16 threads q06>> Run with 15 times, last 15 average (ms): 56.8858 | 59.466 56.651 57.109 <12 more> # 32 threads q06>> Run with 15 times, last 15 average (ms): 53.4254 | 55.884 54.457 52.878 <12 more> # 64 threads
It may become verbose when you have to extract information for all queries over three different kinds of versions. We provide a simple solution for it.
./run.sh fetch log | python gen_for_copy.py
Output data in the following format
// query id
| ... | ... | ... | # 1 thread | ... | ... | ... | # 2 threads ... ... ... | ... | ... | ... | # 64 threads
Note that we copy the generated numbers into an Excel described in Section 3. Within an Excel file, we compare the performance difference in MonetDB and different versions of the generated C code.
Compilation time
Work directory
/home/hanfeng/cc20/horse/codegen
Fetch compilation time for different kinds of C code
./run.sh compile naive &> log_cc20_compile_naive.txt ./run.sh compile opt1 &> log_cc20_compile_opt1.txt ./run.sh compile opt2 &> log_cc20_compile_opt2.txt
Let's look into the result of query 1 in the log file log_cc20_compile_naive.txt.
Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 5%) 1266 kB ( 18%) phase parsing : 0.07 ( 54%) 0.07 ( 88%) 0.14 ( 64%) 3897 kB ( 55%) phase opt and generate : 0.06 ( 46%) 0.01 ( 12%) 0.07 ( 32%) 1899 kB ( 27%) dump files : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 9%) 0 kB ( 0%) df reg dead/unused notes : 0.01 ( 8%) 0.00 ( 0%) 0.00 ( 0%) 31 kB ( 0%) register information : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 5%) 0 kB ( 0%) preprocessing : 0.03 ( 23%) 0.02 ( 25%) 0.08 ( 36%) 1468 kB ( 21%) lexical analysis : 0.00 ( 0%) 0.03 ( 38%) 0.05 ( 23%) 0 kB ( 0%) parser (global) : 0.04 ( 31%) 0.02 ( 25%) 0.01 ( 5%) 2039 kB ( 29%) tree SSA other : 0.00 ( 0%) 0.01 ( 12%) 0.00 ( 0%) 3 kB ( 0%) integrated RA : 0.01 ( 8%) 0.00 ( 0%) 0.01 ( 5%) 726 kB ( 10%) thread pro- & epilogue : 0.02 ( 15%) 0.00 ( 0%) 0.00 ( 0%) 41 kB ( 1%) shorten branches : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 5%) 0 kB ( 0%) final : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 5%) 56 kB ( 1%) initialize rtl : 0.01 ( 8%) 0.00 ( 0%) 0.01 ( 5%) 12 kB ( 0%) rest of compilation : 0.01 ( 8%) 0.00 ( 0%) 0.00 ( 0%) 62 kB ( 1%) TOTAL : 0.13 0.08 0.22 7072 kB
The whole compilation time is split into many parts. We take the total wall time as the actual time spent on the code compilation. In this query, it needs 0.22 seconds to complete the whole compilation. (Note that manual work is required for retrieving the compilation time.)
3.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The coordination chemistry of 2,2′-bis(di-iso-propylphosphino)-trans-stilbene (tPCHCHP) with group 10 metal centers in a variety of oxidation states is reported; different coordination modes were observed depending on the oxidation state of the metal. With metal centers in the 0 or +1 oxidation state ((tPCHCHP)Ni, [(tPCHCHP)Pd]2, (tPCHCHP)NiCl, (tPCHCHP)NiI), η2 coordination of the olefin occurs, whereas, with metals in the +2 oxidation state, C–H activation of the backbone, followed by rapid H–X reductive elimination, was observed, leading to an η1 coordination of the backbone in (tPCCHP)MCl (M = Ni, Pd, Pt). Employing the methyl-substituted analogue, 2,2′-bis(di-iso-propylphosphino)-trans-diphenyl-1,2-dimethylethene (tPCMeCMeP), forced an η2 coordination of the olefin in [(tPCMeCMeP)NiCl]2[NiCl4]. The synthesis of the hydride complex (tPCCHP)NiH was attempted, but, instead, led to the formation of (tPCHCHP)Ni, indicating that the vinyl form of the backbone can function as a hydrogen acceptor. All metal complexes were characterized by multinuclei NMR spectroscopy, X-ray crystallography, and elemental analysis.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The synthesis and characterization of the two iron chlorin complexes [FeIII(TPC)(NH2CH(CO2CH3)(CH(CH3)2))2]CF3SO3 (1) and FeII(TPC)[(NH2CH(CO2CH3)(CH(CH3)2)]2 (2) are reported. The crystal structure of complex 1 has been determined. The X-ray structure shows that the porphyrinate rings are weakly distorted. The metal−nitrogen distances to the reduced pyrrole N(4), 2.034(4) Å, and to the pyrrole trans to it N(2), 2.012(4) Å, are longer than the distances to the two remaining nitrogens [N(1), 1.996(4) Å, and N(3), 1.984(4) Å], leading to a core−hole expansion of the macrocycle due to the reduced pyrrole. The 1H NMR isotropic shifts at 20 °C of the different pyrrole protons of 1 varied from −0.8 to −48.3 ppm according to bis-ligated complexes of low-spin ferric chlorins. The EPR spectrum of [Fe(TPC)(NH2CH(CO2CH3)(CH(CH3)2))2]CF3SO3 (1) in solution is rhombic and gives the principal g values g1 = 2.70, g2 = 2.33, and g3 = 1.61 (∑g2 = 15.3). These spectroscopic observations are indicative of a metal-based electron in the dπ orbital for the [Fe(TPC)(NH2CH(CO2CH3)(CH(CH3)2))2]CF3SO3 (1) complex with a (dxy)2(dxzdyz)3 ground state at any temperature. The X-ray structure of the ferrous complex 2 also shows that the porphyrinate rings are weakly distorted. The metal−nitrogen distances to the reduced pyrrole N(4), 1.991(5) Å, and to the pyrrole trans to it N(2), 2.005(6) Å, are slightly different from the distances to the two remaining nitrogens [N(1), 1.988(5) Å, and N(3), 2.015(5) Å], leading to a core−hole expansion of the macrocycle due to the reduced pyrrole.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Proton assignment in 1H-NMR [47,48].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file includes all the test data of the asphalt binders used in this study. (XLSX)
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The lineitem table of the TPC-H benchmark's SF100 data set. Stored in Parquet files. Website: https://www.tpc.org/tpch/ Related publications: - TPC Benchmark H, Standard Specification Revision 2.17.2, 2017. https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdf - Meikel Poess: TPC-H. Encyclopedia of Big Data Technologies, 2019. https://link.springer.com/referenceworkentry/10.1007/978-3-319-63962-8_126-1