You can vend BD STS Tokens for your self once you have logged in. These tokens allow you to run queries through HTTPFS+Parquet Boiling API e.g. directly from duckdb or with plain command line curl/wget and output the SQL query results as parquet file if you like.
The BD STS Token is a short term JWT token. When used, BD backend authenticates it (identifies you and verifies the token) and allows access to your data as if you were querying data on the BoilingData GUI with SQL. In essence, without any furthers options, the scope of the token is the same as the IAM Role you created with bdcli onto your AWS Account.
NOTE! You don't need your own AWS account, only if you want to share YOUR data.
The JWT token is valid for a short term, and in essence, like AWS STS credentials. You can vend more of them.
NOTE! You can vend multiple tokens but at least for now they can not be revoked. The default lifetime of a token is
1h, but you can change the default with--lifetimeoption. Max lifetime for now is 24h.
bdcli account sts-token --lifetime '8h'You can further limit the scope of the token by giving an optional SQL clause and choose the validity period (default is 1h). The SQL clause can also be a group by or any aggregation that Boiling supports in general. It basically defines what data is heated up from S3 for your queries.
bdcli account sts-token
--lifetime '2h' # single token expiration time, default is 1h, in string format like https://github.com/vercel/ms
--sql 'SELECT passenger_count, tpep_pickup_datetime, tpep_dropoff_datetime, trip_distance FROM parquet_scan(''s3://boilingdata-demo/demo.parquet'') WHERE vendorid=3;'
--name pcounts
--duckdbrcThis command above generates a token that has access to an in-memory Table generated by the SQL, with those columns and rows only that belong to VendorId 3. The access time validity is 2 hours starting now.
With the --duckdbrc switch the token is stored into your ~/.duckdbrc file as DuckDB table macro. And at this point, the macro does not take any arguments as the "pushed down query" is already defined.
SELECT * FROM pcounts();NOTE! This way you can generate short lived views with both row and column level security - like segmenting data access for customers/partners, but also providing limited access (permanent or temporary) to various roles in your company. When using the token and trying to query columns that are beyond the scope, Boiling reports an error like "column not found", or "no results", like with any Table.
To make these tokens useful in practice you can create and share tokens for other users. A shared token is bound to target users/groups - meaning that only those specific target users are able to use the token within the time period you specified (e.g. 8 hours starting now).
bdcli account token-share
--lifetime '1h'
--vending-schedule '* * * * * *' # optional cron expression in UTC time when the token vending is allowed. Please not that if the token is vended it is valid as in <lifetime>.
--sql 'SELECT vendorid, passenger_count, tpep_pickup_datetime FROM parquet_scan(''s3://boilingdata-demo/demo.parquet'') WHERE vendorid=3;'
--users dan@boilingdata.com,bill@boutlook.com,anne@gmail.com
--name bdnyc # optional name for the data shareYou can list all data shared to you.
bdcli account token-listUsage scenario: You can use BoilingData without any configured data access (i.e. your own data or even your own AWS account), just get Boiling subscription and start querying data shared to you.