I have a table of data (100k-1M rows). The most important attribute is “price”. Users make division of the total sum of price into various groups by custom queries, that pick data from the table. Queries have no pattern – their WHERE clause is made of different conditions.
Question: how can I determine, that list of queries is complete – every record in the table is selected by exactly one select? In other words – no record is omitted and no record is selected by multiple selects.
Is there effective method to identify:
- what/how many records were omitted;
- what selects return duplicates?
My first thought is to use hashmap for every query and whole table with row IDs as a key to simulate query results. That seems to be very memory demanding to me.