I am trying to use Deequ library to profile my data, among other things. I can see that AWS has now implemented the KLL algorithm (from Karnin et.al Optimal Quantile Approximation in Streams https://arxiv.org/pdf/1603.05346.pdf)
The paper explains the algorithm, which I am trying to understand, but was wandering perhaps anyone in the community can provide a simpler overview, not about the mechanics of the algorithm (i.e. not so much how it works), but mostly on what the three parameters of the AWS implementation mean (sketch size, shrinksize, and bucket)
I appreciate any help or advice.