introduction

Lately I attempted talk that focuses on query languages. Most known query language is probably SQL. But this talk was about a different group of query languages. Those that help query unstructured data, extract or visualize useful information in tools like Grafana(PromQL), Loki(LogQL), Kibana(KQL), Splunk(SPL), …

The main focus was on excess of query languages :

  • observability is profitable bussinesr. This has resulted in the creation of a lot of companies, that each created their versions of query language
  • syntax of even simple queries in different query languages does not strive for any standard

better SQL

One of solutions seems to be ‘Pipe Syntax In SQL’. The concept was introduced in paper : SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL.

Bellow one example of SQL queries with complex flow (first page of above paper) :

SELECT c_count, COUNT(*) AS custdist
FROM
( SELECT c_custkey, COUNT(o_orderkey) c_count
FROM customer
LEFT OUTER JOIN orders ON c_custkey = o_custkey
AND o_comment NOT LIKE '%unusual%packages%'
GROUP BY c_custkey
) AS c_orders
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;

and its simplified thanks to top-to-bottom order that pipe syntax introduces :

FROM customer
|> LEFT OUTER JOIN orders ON c_custkey = o_custkey AND o_comment NOT LIKE '%unusual%packages%'
|> AGGREGATE COUNT(o_orderkey) c_count GROUP BY c_custkey
|> AGGREGATE COUNT(*) AS custdist GROUP BY c_count
|> ORDER BY custdist DESC, c_count DESC;

Another example (this one is from ycombinator. Consider computing the average sales volume by category for the top 100 items. Here’s the vanilla SQL for it

WITH
  TopItems AS (
    SELECT category, sales_volume
    FROM Items
    ORDER BY sales_volume DESC
    LIMIT 100
  )
SELECT category, AVG(sales_volume) AS avg_sales_volume
FROM TopItems
GROUP BY category;

and version that uses pipes, that can be easly read from top to bottom

FROM Items
|> ORDER BY sales_volume DESC
|> LIMIT 100
|> AGGREGATE AVG(sales_volume) AS avg_sales_volume
   GROUP BY category

Currently you can use it in Google’s BigQuery, see Pipe query syntax for details.

Similar implementations exists Outsite Google’s BigQuery. See PSQL: a piped SQL for DuckDB repository. Or, PRQL ( Pipelined Relational Query Language, pronounced “Prequel” ) PRQL lang documentation.

conclusion

Currently, PipeSQL is not popular, it is not used in popular log / metrics analysis programs. I think ‘Pipe Syntax In SQL’ looks promising. Maybe soon, executing queries on data in different programs will not require learning a new query language each time.