aasdasd
You are an expert in SQL and Tinybird. Follow these instructions when working with .datasource and .pipe files: | |
<command_calling> | |
You have commands at your disposal to develop a tinybird project: | |
- tb build: to build the project locally and check it works. | |
- tb deployment create --wait --auto: to create a deployment and promote it automatically | |
- tb test run: to run existing tests | |
- tb --build endpoint url <pipe_name>: to get the url of an endpoint, token included. | |
- tb --build endpoint data <pipe_name>: to get the data of an endpoint. You can pass parameters to the endpoint like this: tb --build endpoint data <pipe_name> --param1 value1 --param2 value2 |
When you a clickhouse cluster and you run queries on all the replicas it's not easy to get all the queries ran. I use system.query_log
all the time to check timings, errors and so on.
So what I do is create a global query_log:
:) create view query_log_all on cluster my_cluster as select * from remote('10.0.0.1,10.0.0.2', 'system.query_log')
So I can inspect queries in all the replicas with a single query:
https://web.archive.org/web/20190312223043/http://the-witness.net/news/2012/05/the-depth-jam/ |
Clickhouse has a pretty good endpoint /replicas_status
which gives information about the, guess what, replication status. When you are working on a cluster in which you use replication to increase the amount of QPS you usually have a load balancer before, something like this:
+--------------+
| |
+-------->+ clickhouse |
| | |
| +--------------+
|
Sometimes you have to move data from one table to a different one. You usually use
insert into target select * from source
This works but have several problems:
- materialized columns are not properly copied
- it's slow
Looks like filters are pushed down when filtering an "UNION ALL". Also an example on how to use EXPLAIN
in clickhouse and a different view of seeing what is going on with the traces, this lines show how much data clickhouse is reading:
Selected 1 parts by date, 1 parts by key, 2 marks by primary key, 2 marks to read from 1 ranges
Reading approx. 16384 rows with 1 streams
The example
Clickhouse has a powerful feature, JOIN engines, that allows to prepare a table to be joined with better performance that a regular table (MergeTree, Log...). It also allows to use joinGet
to get table values using a key.
Somtimes you don't have a JOIN table but you'd like to use something with the joinGet performance. Unfortunately you can't use joinGet with something created on the fly (well, you could create a temporally join table but you need several SQL queries).
So there is a way to do that, using transform
:
with (
select (groupArray(key), groupArray(value)) from my_table
) as key_value
MacBook-Pro-de-javi.local :) create table multiple_keys (tmp Int32, testMap Nested (a Int32, bKey Int32, value Int32)) Engine=SummingMergeTree() order by (tmp); | |
CREATE TABLE multiple_keys | |
( | |
`tmp` Int32, | |
`testMap` Nested( | |
a Int32, | |
bKey Int32, | |
value Int32) | |
) |
These are the steps I followed to compile clickhouse on OSX mojave (10.4.3). These might not be the best way to make it compile on OSX, my knowledge about C++ and build tooling is really limited but they do the work.
clickhouse sha-1: 2ad4df1d6a
compile command (apply the patch below before running cmake)