Wednesday, December 2, 2009

Looking for tennis courts on aerial photos: how it works.

This is a description of how works.

The things I'll be looking for are basically just a bunch of lines on the ground (like tennis or basketball courts), so the first step is to convert the original image into an array of line segments (i.e. pairs of points, (x1, y1) - (x2, y2)). In order to do that, the image is converted into gray-scale and 1st and 2nd derivatives of each horizontal and vertical line are taken:

Note that these images aren't really binary 0-or-1 images, it's just the way OpenCV shows them. Instead, each pixel is a floating-point number which is the value of the derivative in the corresponding point.

Now, I'm interested in thin bright lines (i.e. the court boundaries, usually made with white paint), so I need to find the spots where the 1st derivative is somewhat large and positive (which means that the brightness is growing rather fast, transitioning from the green/blue/gray background to the bright-white court boundary line), and the 2nd derivative is somewhat large (by modulus) and negative (which means it wants to make the 1st derivative smaller and eventually negative, for the transition back from the white line to the darker background). Simultaneously scanning the 1st and 2nd derivatives for such spots produces the following:

The left image is the result of scanning the horizontal derivatives, the right one - of the vertical derivatives. The horizontal derivatives are better for finding the vertical lines and vice versa, which is to be expected - e.g. in the extreme case, when horizontal derivatives are taken along a horizontal line, the brightness doesn't change, so there's no information to gain as both derivatives will be constant.

The next step is to build the actual lines: I first merge the neighboring pixels into little horizontal and vertical segments ("slices"). Then I start at one of those slices - let's say a horizontal one - and see if there's another one immediately underneath it, such as that the projection of one of the second one's ends falls within the first one. If there is such a slice, I connect the two slices' centers with a line and look further down, hoping to find a third slice, that attaches to the 2nd one. If there is one, I discard the line between the 1st and 2nd slices and now make a line between the 1st and the 3rd one, but check if the line still goes through the 2nd slice. The process continues until an intermediate slice is no longer crossed by the line being built - then the last line that goes through all intermediate slices is taken and added to the glorious array of lines.

Quite often there's a gap between slices, so that there's no slice immediately under the previous one. In this case, I look along the line built so far to see if there are good candidates further down. This allows to "mend" broken lines and makes for stronger, longer lines.

Side note: OpenCV has an implementation of the Canny algorithm, but it tends to make somewhat jagged lines, especially when working on Terraserver's images, which aren't very sharp (and not particularly new). Half-way into the project I've discovered that USGS has greater (and public domain) imagery, for which the Canny algorithm may work better, but this slices sorcery was already implemented by then, so why throw away a working thing.
There's also a Hough transform implementation for making lines out of pixels, but that didn't work quite well for me... Perhaps I need to play with it some more, although since this is a hobby project, it is more fun to make stuff up by myself.

Once I have the lines, the actual search begins: first I locate a pair of perpendicular line segments (L1, L2) that are close enough to each other to be a part of the model shape... I guess I haven't written about the model shapes yet, but it is basically a set of lines that make up what I want to find (e.g. 11 lines for a tennis court), with some of the corners marked as "key" points, which represent unique ways of applying the model shape to an intersection of two lines.

The model is rotated so that the lines adjacent to the "key" point are parallel to that pair of perpendicular lines (L1 and L2) I found in the previous paragraph and then moved to the intersection point of L1 and L2. Then for each of the model shape's lines I look for a "matching" line within the rest of the image. "Matching" means it is more or less parallel to the model shape's line (a leeway of pi/60 is allowed) and, to quote the code comment (yes! I do have comments there ;))

1. at least one of the shorter segment's ends is "within" the longer segment ("within" means the end's projection onto the longer segment lies within the segment)
2. distance(s) from the end(s) of the shorter segment that are "within" the longer segment to the longer segment are less than a threshold.
3. the shorter line's length is at least a certain portion of the longer one (like 40%)

If enough of the model shape's lines have a matching line in the image, then...

It's a match!

A few more things are done that I didn't describe - e.g. the array of lines is sorted by the the angle each line makes with abscissa (the x axis), so that just a little portion of the array will have to be traversed when looking for matches/perpendiculars. Also, to avoid false positives, I check how many lines fall completely within the model shape and if there are too many, the match is discarded. This happens quite often when processing ocean shots, where sun's reflection off the waves produces a lot of rather randomly-directed lines, so a "matching" line for the model shape's lines can almost always be found - something akin to "comb jamming", I guess.

The next thing I'd like to try is finding the basketball courts, which pose some unique challenges, because they:

a) are of very varying sizes (and I'm not talking about different standard sizes, like "high school" vs "NCAA" vs "NBA" - it would seem that quite often these courts are drawn to take the available space, so if there's no room to make it as long as it should be, then it's just made shorter)

b) contain circles

Circles are probably a blessing in disguise, because even though it is harder to look for them then for the straight lines, they seem to be of standard size mostly, at least the middle one and the half-circles on top of the free-throw line. I think what I need to do is to find a suitable rectangle that is between the "high school" and "NBA" standards in size and then see if there are appropriate (half)circles where they should be.

I also intend to make the source code available if anyone wants it, I just need to clean it up somewhat - mostly remove all the experimental stuff that just sits there unused.