The arg_max
operator in Kusto is great to filter out duplicate data during query.
But it can be quite resource intensive and result in very long query time compared to not filtering out duplicate data.
table
| summarize arg_max(ingestion_time(), *) by SomeKey, OtherKey
If the group by key has a high cardinality, using the shuffle
strategy can yield better performance.
table
| summarize hint.strategy = shuffle arg_max(ingestion_time(), *) by SomeKey, OtherKey
It is also possible to specify the suffle key for a multi-key group by using hint.shufflekey = <key>
table
| summarize hint.shufflekey = OtherKey arg_max(ingestion_time(), *) by SomeKey, OtherKey
I also tried to use “Progressive Mode” for Kusto queries, hoping it would allow me to get a stream of the results an process them on the fly. But that’s not really how it works. With “Progressive Mode” on, the results are send in multiple frames, but each frame could append to or replace completely the data received previously. So You still need to wait until the query is over to get the final result.