You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
read the record batches from arrow files in staging directory run datafusion queries to fetch count, distinct count and count for each distinct values for all fields in the dataset
store in <dataset>_pmeta dataset
UI to call below SQL query to fetch the stats from this dataset-
```
SELECT
field_name,
field_count
distinct_count,
distinct_value,
distinct_value_count
FROM (
SELECT
field_stats_field_name as field_name,
field_stats_distinct_stats_distinct_value as distinct_value,
SUM(field_stats_count) as field_count, field_stats_distinct_count as distinct_count,
SUM(field_stats_distinct_stats_count) as distinct_value_count,
ROW_NUMBER() OVER (
PARTITION BY field_stats_field_name
ORDER BY SUM(field_stats_count) DESC
) as rn
FROM <dataset>_pmeta
WHERE field_stats_field_name = 'status_code'
AND field_stats_distinct_stats_distinct_value IS NOT NULL
GROUP BY field_stats_field_name, field_stats_distinct_stats_distinct_value, field_stats_distinct_count
) ranked
WHERE rn <= 5
ORDER BY field_name, distinct_value_count DESC;
```
0 commit comments