We recently repaired the mergedkey branch of v4 to force it to run dbjobs automatically after insert_batch. This is because the fix_islatest query, having been stripped of the process_status column, performs quite poorly when you wait to run dbjobs until all the CSVs have been run through insert_batch. Each additional CSV file added to signal_load increases the number of rows participating in the fix_islatest join, but a smaller and smaller percentage of rows are updated.
We want to:
- Add a check to insert_batch so that if signal_load has anything in it, it throws an exception
- Add a test (could be an integration test if you did the database setup, or a unit test if you mocked out the database responses) to confirm that insert_batch throws an exception if you run it with something in signal_load