Skipping Optional Arguments in ClickHouse Table Functions
Recently I had to insert data from JSON files stored in S3 into ClickHouse.
The files were gzipped, but didn’t contain the .gz
suffix, so ClickHouse kept
treating them as regular JSON files and throwing an error when trying to parse them.
When I looked at the signature of the s3 table function
I noticed a problem.
s3(url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,structure] [,compression_method])
The compression_method
argument I wanted to specify was behind two other arguments,
one of which I really didn’t want to specify.
First I tried specifying the argument as you would in a programming language like Python.
Then I tried using Snowflake-style =>
but it didn’t work either. So I gave up
on named arguments and tried finding an alternative approach.
After a decent amount of googling I stumbled upon this GitHub issue comment that looked promising, and indeed it was what I needed. In my case the query looked something like this:
SELECT * FROM s3( '<bucket_url>', '<aws_access_key_id>', '<aws_secret_access_key>', auto, auto, 'gz' )
Since I couldn’t find any details on auto
in ClickHouse documentation other than
a passing reference in the schema inference docs,
I was wondering if it worked for all functions or just table functions.
I decided to test it on the first function with more than one optional argument I
come by. I ended up trying with makeDateTime64
,
but it didn’t work.
SELECT makeDateTime64(2023, 2, 28, 17, 12, 33, auto, auto, 'Europe/Amsterdam');
So I guess, at least for now, we only get to use auto
with table functions.
If you've come this far with the article you may want to know a thing or two about me if you don't already. You can also read other blog posts or about stuff I've learned recently.
This website is open source. If you've come across a mistake please let me know there. For other types of feedback you can reach out to me through email or social media.