Calgarypuck Forums - The Unofficial Calgary Flames Fan Community - View Single Post

psyang · 03-17-2011, 03:19 PM

Each database will use different strategies when evaluating a query, so it's hard to know exactly why the query takes so long. But it's a safe bet that it is doing a full table scan.

Your original query is:

Code:

select count(cat.id) as cat_col0 
from cat 
inner join animal on cat.id=animal.id, house 
where house.id=10690 and 
(cat.id in 
  (select a.id from house_animal, animal a where
     house.id=house_animal.house_id and 
     house_animal.animal_id=a.id
  )
)

The subquery in the "in" clause is where your full table scan is occuring. The db is not optimizing the query by limiting the house_animal table to the rows with house_animal.house_id = 10690. As a result, the db is essentially executes a full inner join on the house_animal and animal tables (11M rows x # of rows in animal), then restricting that result by house_id=10690.

Why this would be happening is tougher to determine. Query optimizers try to take into account query structure and indices to determine the most effecient plan. Maybe house_animal.animal_id has an index, but house_animal.house_id doesn't, and so the optimizer evaluates the former clause first (inner loop) then filters based on the second clause (outer loop).

Or maybe the optimizer just doesn't like the way this query is structured. The last resort for a query optimizer is always a full table scan. The problem with auto-generated queries is that they will produce correct results, but not often effecient ones since they are always catering to the general case.

I have no experience with postgres, but this is fairly fundamental to most dbs. Hope this helps.

03-17-2011, 03:19 PM	#2
psyang Powerplay Quarterback Join Date: Jan 2010 Exp:	Each database will use different strategies when evaluating a query, so it's hard to know exactly why the query takes so long. But it's a safe bet that it is doing a full table scan. Your original query is: Code: select count(cat.id) as cat_col0 from cat inner join animal on cat.id=animal.id, house where house.id=10690 and (cat.id in (select a.id from house_animal, animal a where house.id=house_animal.house_id and house_animal.animal_id=a.id ) ) The subquery in the "in" clause is where your full table scan is occuring. The db is not optimizing the query by limiting the house_animal table to the rows with house_animal.house_id = 10690. As a result, the db is essentially executes a full inner join on the house_animal and animal tables (11M rows x # of rows in animal), then restricting that result by house_id=10690. Why this would be happening is tougher to determine. Query optimizers try to take into account query structure and indices to determine the most effecient plan. Maybe house_animal.animal_id has an index, but house_animal.house_id doesn't, and so the optimizer evaluates the former clause first (inner loop) then filters based on the second clause (outer loop). Or maybe the optimizer just doesn't like the way this query is structured. The last resort for a query optimizer is always a full table scan. The problem with auto-generated queries is that they will produce correct results, but not often effecient ones since they are always catering to the general case. I have no experience with postgres, but this is fairly fundamental to most dbs. Hope this helps.