.he `POLYSEE``Page %`
.fo `Max Clowes`- % -`November 1977`
.ce2
Scene Analysis
==============
.sp2
I want to make Roberts program central to this discussion of Scene Analysis
for reasons that I hope will become clear. His account is difficult to
understand because of the use of matrix algebra to handle the 3D
recognition problem. So I`ve invented a simpler - much simpler! - version
of his task (its one that is attributed by Guzman to Canaday). We have a world
that consists of simple opaque flat shapes that can overlap. Images like this:
.sp2
 		*****  *
 		*   * **
 		*   ** *
 		*   *  *
 		*   *  *
 		*****  *
 		 *     *
 		*      *
 	       *********
.sp
                 Fig. 1
.sp2
You might like to design some Turtle programs for square, triangle diamond etc.
that will draw `complete` squares, triangles etc. but will also, when the occasion demands it,
portray occlusion of one shape by another.
.pg
One way is to get Turtle during the course of drawing a shape, to inspect
each picture point before it does Draw. If it finds ink then there is an
occluding object - so turn off the ink until the other side of the object is
encountered. That is the strategy I`ve adopted in the function squ();
in a library file called LIB OUTLINES;
What problems does it give rise to? Can you think of a principled and more reliable method? Can you add TRI so as to be able to draw the 
picture above? The problem is not trivial, it`s known in the jargon of
Computer Graphics as 'hidden line deletion`.
.pg
Roberts program, having recognized 
the object(s) in an image, was able to demonstrate its understanding by
presenting a different view. To be able to do that you (program or human!)
have to be able to
.br
(1) Delete lines that will not be visible from the new viewpoint and
insert new lines that will be.
.br
(2) Infer the existence of lines belonging to a recognised object even
though those lines are not visible because of inter-object or intra-object occlusion. The ability to do (1) presupposes
the successful accomplishment of (2).
.pg
This `inference' is present even in our simple laminar world. The missing section of hypoteneuse in Fig(1) is perhaps fairly trivial, but suppose more of the
triangle were to be covered over. (How much can be covered
over without prejudicing the recognisability of the triangle?
.pg
A lot depends on how many different kinds of object are permitted to occur:
if only one type of object - a triangle - can contain 45 degree angles,
then quite a lot can be covered up. Let`s assume,
following Roberts that we have three shapes: square, isoceles right-angled triangles, and regular octagons. (He had a HEXagonal prism rather than our 
OCTagonal lamina).
Let`s assume they can be any size and location but lets keep it simple by
assuming that they must always appear upright. (He didn`t).
So we`ve got something not unlike the letter recognition problem
but with a much smaller set of categories - 3(!) rather than 
the 26 or 52 or whatever. (In fact Robert`s program was not confined
to just these three kinds of OBJECT because he devised a method for
seeing any object as a combination of these three models. For Roberts
a model is a 3D shape not an object). If you`ve LOADed a copy of my
OUTLINES.P file (see SHELL demo) you will have drawn a pair of overlapping squares - do a DISPLAY();. Following Robert`s 'scenario` we want to devise a program
that will tell us what laminae are present in that picture, where they are and
draw 'completed` instances of each*.
('Scenario` is the name frequently given to a description of what an AI
program accomplishes or seeks to accomplish. Thus the 
scenario for ELIZA is "replies to utterances as would
a Rogerign Psychotherapist").
.pg
It would be nice if we could use the same TURTLE apparatus employed in drawing these shapes, to somehow recognise them too. That is, each of the lamina -
functions (SQU; TRIU; OCTU;) contains an implicit definition of the shape
it draws. A definition phrased in terms of TURTLE actions. We could have three
recognition functions called, say RTRI RSQ ROCT, which inspected the picture to see if there was a triangle or a square or an octagon beginning in a specified
location and having
a specified size. The problem then becomes one of somehow finding out
from our given PICTURE data
WHICH
function to use , where to try it and how big its size parameter should be.
Roberts version of that is "which of his basic models to try in the picture,
where the model is in space, how big and with what orientation". His
solution was to look for CUES in the picture which are characteristic of one of
his models. For example a wedge model if present may disclose a triangular face. A block should offer two or even three parallelograms grouped round a FORK junction.
.pg
What cues are in our picture that we might use? How should we discover those cues? Remember that its not just a matter, say, of noticing
that the angle of an ELL in the PICTURE is 45 degrees and therefore
a fragment of a TRI. We`ll need to know how big the TRI is and which corner we`ve got hold of, i.e. 'is it a startcorner of RTRI?` And we`ve got to remember that occlusion is going to mess things up too!
.bb
*N.B. Of course his 'input` was a graylevels picture like BASIC() in Steve`s
package PICPLAY. I`m assuming that we have somehow converted that graylevels
image into an outline... another file of mine called LAMINAE contains turtle programs that draw filled-in laninae, and PICPLAY can be used to
`edge' those pictures see PICPLAY demo.
.tp10
.pg
A direction in which solution lies, is to use SEEPICTURE to describe
the picture. Minimally it gives us all the ELL`s in the picture either via
its DATABASE or via the labels it deposits in the picture.
Roberts had a picture structure like DATABASE which he searched for
the cues characteristic of his models. He would look for a FORK whose 
three regions each had four lines..... and would then try to fit a cube to them.
In practice a cube may not disclose all three surfaces intact 
because of occlusion, so he might have to settle for just two
quadrilaterals. Our problem is easier and I`ve devised what seems to
me a simpler strategy. It is to collect together the DATABASE entries
comprising each visible boundary in the picture. In a sense I`m proposing
to 'segment` the database into parts each of which is the boundary of just one lamina. Then I`ll use each part
to decide what recognition function to try, where and with what size. (This
idea of segmenting the database of PICTURE cropped up in  the IDENTIFIER and
RECOGNISER demos as a way of thinking about pictures of text
where there are lots of letters in the image and where we`d like to take
them one at a time).
.sp
.sh
Searching the DATABASE.
======================
.hs
.pg
Use SEEPICTURE to analyse the picture drawn by my OUTLINES file.
and then give the commands:
 	: DISPLAY();		;;; to print the picture
 	: DATABASE ==>	;;; to print the DATABASE
.pg
What I plan to do is to trace a boundary by starting at a TEE and stepping
on to the ELL at the end of the teestem stepping on to the next ELL.....
until I meet another TEE. To do that I`ve devised a function 
FINDLAMINAE() in a file SEELAMINAE.P (see page 7) that finds ATEE, and then
repeatedly STEPONE-s using first the TEEstem, then the ELL-line delivered by 
STEPONE and so on. Central to all that is the use of the DATABASE package
PRESENT. You should try out PRESENT for yourself eg.
.sp
 		VARS PT LINES;
 		PRESENT ([ELL ?PT ?LINES])=>
 		PT=>
 		LINES =>
 		VARS TYPE L1 L2;
 		PRESENT ([?TYPE ^PT [?L1 ?L2]]) =>
 		TYPE =>
 		L1=>
 		L2=>
.sp
The list pattern given to PRESENT reflects the structure of the junction
entries in the database. The use of this data stucture to represent
the picture is of course a recurrent theme in these Vision demos
and it  constitutes the principal distinction between AI models
of perception and the various kinds of template, measures and feature theories that have emerged from the Psychology lab. (It`s actually more general
than that....  all AI theories of intelligent behaviour will come to have - if they don`t already have - structures that represent the task
The structures have something in common with the linguist`s view of language understanding but typically are more complex and much more closely tied
to the procedures (programs) that manipulate those structures).
So its important that you get the 'feel` of these operations on the databse.
They are like the  list processing operations you`ve already encountered - 
they`re actually built on top of hd and tl - but much more 
convenient to use.
.pg
Try out FINDLAMINAE()=> on the picture drawn by OUTLINES. Before doing
so keep a copy of the list structure (called CONTEXT) generated by
SEEPICTURE, by executing:
.sp
 		VARS STORE;
 		CONTEXT -> STORE;
So that you can restore the database to its original form, without an
expensive call of SEEPICTURE again.
After you`ve run FINDLAMINAE take a look at the DATABASE again.
.sp
 		DATABASE ==>
.sp
It should be shorter, because FINDLAMINAE removes junctions as it 
finds them, 
so that it won`t keep on finding the same ELL over and over again.
.pg
You`ll need to extend FINDLAMINAE to find the ell boundaries in the
database. The inner `until' loop (lines 54-58) is the basic process
to move you round a boundary, but you`ll need a different stopping
condition, because boundaries of un-occluded objects won`t terminate
in TEES in fact they won`t terminate at all!
.pg
You might like to use you OUTLINES Turtle functions to draw
another picture for analysis. I found that my strategy for getting TEE boundaries
doesn`t always work, e.g.if a lamina occludes one object and is occluded
by another:-
.sp2
 		*********
 		*       *
 		*       *
 		*       1****2
 		*       *    *
 		*       *    *
 		*       *    3****
 		*********    *   *
 		     *       *   *
 		     *       *   *
 		     *********   *
 		        *        *
 			*        *
 			*        *
 			**********
                           Fig. 2
.sp
.sp2
Why is that? Can you devise a more robust strategy for extracting these
boundaries one at a time from the database? The issue here is not simply(!)
a programming issue.
It has to do with what these types of junctions and their characteristic
components (i.e. the STEM or CAP of a TEE.) can tell us about their
'boundary membership'.
Thus if the attempt to traverse the database representation
of Fig 2 begins at the junction I've labelled 1, it will proceed to 2
then 3.
Should it stop here? ... it is a TEE.
Where should it stop?
The inner <until> loop of FINDLAMINAE tests the junction type with

 	until element(1, jn) = "tee"

is that right?
.pg
The development of this line of thinking about the significance of
these junction categories for scene analysis leads directly into the
work of Guzman, Huffman, Clowes (1971), Waltz and Mackworth.
Before elaborating upon that let us round off discussion of
the mini-Roberts program.
.pg
We know that our three basic shapes are characterised by three different
kinds of angle:

 	square ........... all 90 degree angles
.br
 	triangle ......... one 90 degree angle, two 45 degree angles
.br
 	octagon .......... all 135 degree angles.

So we need to look at our boundary string to see what kind of angles it
contains.
And to extract the length of a side between characteristic ells (never tees!)
to determine size, and to extract a PT so as to position an attempt to draw
it.
You may find it useful to try to find these items from the list that
FINDLAMINAE returns by assigning that list to CONTEXT, so that you can
use the DATABASE functions on it.
.pg
The DATABASE functions PRESENT, REMOVE etc all refer to the contents of the 
data structure as the list CONTEXT, so

 	FINDLAMINAE() -> CONTEXT;	will enable you to go on to
extract an ELL:

 	e.g.	PRESENT ([ELL ?PT [?LINE1 ?LINE2]) ->ELL1;
.pg
To work out the angle that this ELL presents is quite complex ... you may not
know that the TURTLE package contains a useful function - TURNTO(x,y) which
causes the turtle to turn towards the point (x,y), thereby giving the turtle a
heading that you can get at:
.br
 	JUMPTO(1,1); TURNTO(5,5);
 	HEADING =>
.sp
So maybe we could extract the points P1, P2 in Line1 and 
.sp
 		JUMPTO(dl(P1));
 		TURNTO(dl(P2));
 		HEADING =>
.sp
which is the direction of LINE1. There is obviously quite a lot  more
work to do to devise procedures that will extract the relevant
facts from a boundary as a basis for an attempt to draw-recognise it.
How should that draw-recognise be organised? Bearing in mind once more
that we will have to cope with occlusion. Should we predict from the
data we`ve recovered where the corners of the hypothesised objects
should be in the picture and then get TURTLE to examine those PICTURE 
points forthe occurrence there of an "l"(Left there by SEELines). That`s what Roberts did.
If they weren`t in the predicted places then he
would reject that hypothesis and try another, UNLESS the absence of the
predicted corner could be accounted for by occlusion evidence. Specifically,
that the predicted picture point lay on the continuation of a TEEstem.
.sp
.sh
What do Junction Categories Tell Us?
=====================================
.hs
.pg
Roberts didn`t try to find all the bits of an object before attempting to
recognise it, only enough of a cue to suggest a hypothesis. If that
hypotheses succeeded he would then `delete' that problem from the
picture.... he`d recognised it. But the idea that we might be able to `deliver'
to his recognition apparatus, bundles of picture data comprising all the
visible fragments of a body-to-be-recognisd, is attractive, Guzman`s
program SEE can be viewed as an attempt to do this. But of course we
can`t just run  round `single' boundaries. (If you run SEEPICTURE on the
PERFECT picture in PICPLAY - you`ll see the sort of data structures that
characterize images of truly 3D scenes. How would FINDLAMINAE fare on that?).
In fact Guzman`s insight was to recognise that it is a bundle of
regions that we need, and that the shapes of junctions are a good indicator of
which regions belong together. Falk re-did Roberts and used Guzman`s
bundle-of-regions finder as a preliminary step. He also tried to infer
- again by using evidence from picture junctions - whether one bundle
supported (in the gravity sense) another.
.pg
The seeming success of Guzman`s program was puzzling, and quite independently
both David Huffman (Huffman 1971) and I (Clowes 1971) came up with
an explanation. Read Huffman he`s much clearer! There`s
much more to these labelling methods than mere bundle-finders. In fact
they have never been used like that - yet. The 'much more` comes out
of the fact that they 'understand` quite a bit about the appearances of
polyhedral scenes - an understanding that we can exhibit through their
rejection of some kinds of impossible object. That way of exposing what
such schemes 'understand` can be used to try to devise more powerful analyses.
Mackworth`s work (Mackworth CSL) is in that spirit. Waltz`s extension of the label types to handle shadows and cracks is further evidence that there
is more in this labelling than meets the eye! Although its not entirely
clear it seems that what is happening here is an attempt
to devise ways of doing scene analysis that don`t rely on numbers, or
x,y,z coordinates in the way that Roberts and Falk do. The problem
is formidable
because it`s concerned with the fundamental question:
how do we represent three-dimensional objects to ourselves, and how are
these objects related to their appearances as images. One thing
is quite clear: whatever representations are used they must be structures, not numbers or names.
.sp 2
.nf
.nx /usr/lib/polysee.p
