Well I thought I would use two layers of rendering, first the background map and above it a grid. The background map would have only tiles and no objects, then I would make a grid mimic each tiles height and draw object on grid. A static background and a dynamic grid foreground, but so far there are few problems I got stuck on.First is the height of the grid tiles. I know tileWidth = 2 * tileHeight, but what about elevated tiles?
I thought if height = 1, I would raise the tile by half of tileHeight and so on. And it gives me this.

Zero height....ignore the rock and the green filled tiles is because the mouse was over it.

height = 1 works fine, mouse picking works.

But height = 2 is bad, it completely overlaps the top-most tile.
That's my first problem.
And because of the two layers I always have to clear two overlapping objects, like

It works fine right? But the clearing algorithm also clears translated objects. Suppose that rock is moving to the next x-index. And visually...

This happens! I think I need to work with pixel clearing algorithm, but how? Height = 1 over laps 3 upper tiles at least.
This here is my second problem.
Or you could question my 2 layered concept, and tell how you folks do it?