.he 'GENPIC''Page %'
.fo 'Max Clowes'- % -'October 1977'
.mh
Characterising the Structure of Pictures
========================================
.hm
.pg
The position taken up at the end of IDENTIFIER is that recognising
a letter is better approached through the structure of the letter
than in terms of the values of some collection of 'measures' or
properties - although these are a more robust characterisation of
shape than letter 'templates'.
The letter  structure exploited in IDENTIFIER is that assigned by
SEEPICTURE.  (But note that SEEPICTURE is very sensitive to quite small
variations in the letter geometry - try the effect on the SEEPICTURE
output of removing one or two elements of a SAMPLE e.g. at corners).
In this demo I want to try to focus on this idea of picture structure.

Grammar as Structure
.br
====================
.pg
The role of structure in perceiving  was the principal  thesis of
Gestalt Psychology (see my notes 'Perception' for references).
'Parts' and 'wholes' is the  essential feature of that thesis and it
recurs in a remarkably clear form in the formal grammatical models
usually associated with the name of Chomsky.
SILLYSENT presents one of these  ideas about syntactic structure namely
the notion of a generative grammar.
Phrases are parts of a sentence, words are parts of a phrase.
The grammar not only specifies (i.e. generates) 'legal' sentences but
also shows how they are structured into parts and subparts e.g.
Noun Phrase, Verb Phrase.
Partly in reaction to the then dominant fashions in Pattern Recognition in the
early 1960's, (see references in IDENTIFIER) a number of people notably
Kirsch 1964 and Narasimhan (but see also Ledley citation and Minsky (1961))
espoused what became known as "the linguistic approach to picture interpretation."
Narasimhan's early attempt (1963) to formulate this ("Syntactic descriptions of
pictures and Gestalt phenomena of visual perception" in CSL) is not easy to
follow in detail partly because it is recognition oriented - his iterated
labelling process is intended to be a pictorial analogue of parsing.
Kirsch also favoured this labelling notion (he worked with Narasimhan) but he tried to
tighten up the analogy with his 'generative picture grammar' for right angled
triangles!
The problem that he is drawing attention to here is one that is easily overlooked
in thinking about a generative grammar for sentences.
It is that the words of a sentence can only be juxtaposed one-dimensionally,
that is they are strung along a line one after another.
Picture elements on the other hand are two-dimensionally distributed.
Moreover the way in which the grammar controls how words get to be juxtaposed comes
about as an accident of the fact that the grammar is itself expressed in
a string language.
Thus we find phrases like 'the boy' in which 'boy'follows 'the', because
there's a rule saying (roughly) 
.sp
 	<Noun Phrase>::=<Article><Noun>

which means article followed-by noun.
.sp
.br
Kirsch was asking two questions in posing his picture grammar:
.sp
 	(1) What replaces 'followed by' as the basic (?) kind of pictorial composition ?
and
 	(2) Should the rules of a picture grammar be expressed as strings or in some other notation
e.g. a two-dimensional 'pictorial' format?
.pg
There are a lot more questions one could ask here about what a grammar is, what 
kinds of relations between words and between picture fragments need to be
specified by a grammar, what is the (all important) relation between grammatical structure
and meaning, and so on.
.pg
The idea of a grammar of pictures in a less formal sense is an important theme of Goubrich's
book 'Art and Illusion' (see Ch V).
The idea that we 'read' pictures and that the development of artistic form is the development
of 'linguistic' forms is skilfully developed.
.pg
In 1966 Narasimhan and Reddy (CSL) published a generative picture grammar that offers a definitive
answer at least to  Kirsch's questions.
Moreover the grammar addressing as it does the structure of block capitals,
dovetails neatly with the issues of letter recognition raised in IDENTIFIER.
The basic idea is exhibited in Fig 2 of their paper reproduced here.
SGMMA and INVH are parts of a letter A - in this case an A with a straight top.
The rule for A in their grammar (Table 1 of their paper) reads like this
.sp
 	A -> INVE.h (11,23;) | INVH.h (11,23;)
.sp
that is there are two kinds of A, either one involving an inverted Vee (the identifier INVE) or the
one involving INVH whose geometry is exhibited on the far right of fig 2.
The composition of INVH is also a rule of the same kind
.sp
 	INVH(1,2) -> SGMMA.l (31; 1; 2).
.pg
The meaning of this rule is exhibited pictorially in Fig 2.
The basic idea is that any picture fragment has one or more points of attachment (INVH has two),
and fragments are composed by citing subfragments e.g. SGMMA and l together with a specification
e.g. 31 of which attachment points on each fragment are involved in the composition.
In this case point 3 of SGMMA and point 1 of l.
i.e.
 	INVH -> SGMMA.l (3 of (SGMMA) with 1 of (l)).
.sp
However INVH must itself have points of attachment in order that it in
turn may participate in further compositions e.g. to make an A.
So the rule further states how the points of attachment on INVH are
derived from those of its components.
Point 1 of INVH is point 1 of its first component SGMMA and point 2
of INVH is point 2 of its second component l.
The notation used by Narasimhan is far from transparent it uses 
'; ' to separate components and ', ' to separate mention of the attachment
points of a component.
To get this clear look at the rule for SGMMA which has three attachment
points the first two of which are derived from r.
.pg
Thus a rule of composition states three things
.sp
 	(i) the two subparts
.br
 	(ii) the way that the two parts are connected
.br
 	(iii) the set of points on the compound as some subset of the 
 	points on the parts.

SILLYSENT shows how generative grammar can be implemented as a program.
We can do the same for Narasimhan's picture grammar although (ii) & (iii) above
introduce significant complexities into the functions.
The following rules specify the structure of a simple H shape, using the
composition schema devised by Narasimhan but with an extended notation that,
hopefully, illuminates the procedural implementation of these rules:-

.nf
 	*   *
 	*   *
 	*   *
 	*   *
 	*   *
 	*****	AITCH (aitch1, aitch2) => PART.VERT(2 of partinstance
 	*   *				is at 2 of vertinstance);
 	*   *				1 of partinstance -> aitch1;
 	*   *				1 of vertinstance+5x -> aitch2;
 	*   *
 	1   *......2


 	*
 	*
 	*
 	*
 	*
 	****2	PART(part1, part2) => VERT.HORIZ ( 2 of vertinstance
 	*			is at 1 of horinstance);
 	*			1 of vertinstance -> part1;
 	*			3 of horinstance -> part2;
 	*
 	1

 	3
 	*
 	*
 	*
 	2	VERT (vert1, vert2, vert3) => drawline (vert1, vert3)
 	*
 	*
 	*
 	1

 	1***2***3	HORIZ(horiz1, horiz2, horiz3)
 				=> drawline(horiz1, horiz2)
 
.fi
Each pattern morpheme i.e. AITCH, PART, VERT, HORIZ has a set of labelled points
associated with it.
The first two have two labels in their sets (aitch1, aitch2; part1, part2)
while VERT and HORIZ have three each, and these are listed immediately
following the morpheme name on the left hand side of the rule.
On the right hand side we state the two morphemes which are to
be juxtaposed e.g. PART.VERT.
All this follows the notation of LETER quite closely.
Following these two components we state how they are to be juxtaposed
in terms of course of labels belonging to each of these components.
The notation here is simply a more explicit version of that used in LETER.
Finally we state how the labels on the left hand side of the rule are
related to those of the components.
(N.B. The expression 5x in the rule for AITCH is shorthand for an
addition of 5 units to the x coordinate of 1 of vertinstance.)
Now what this does is to define some structured objects - named labelsets -
whose elements e.g. part1, aitch1 have been declared 'equivalent'.
The meaning of that equivalence for us is 'the same place in the
picture'.
Moreover any given instance of an AITCH or a PART, VERT or HORIZ will
be at some definite location in the picture - we will want to be able
to request an H in a particular place.
The grammar doesn't have any provision for that - it could be argued
that it doesn't need to.
But if our implementation is to result in actual turtle pictures of
letters, then it will have to handle x,y coordiinate values so that
we can control letter position.
.pg
The functions that follow are very largely due to Aaron Sloman whose
help I am happy to acknowledge.
.nf
: turtle();
: operation 6 of num list;
:	;;;Define an operation of precedence 6, taking a number and
:	;;;a list as arguments.
: 	if num=1
: 	then hd(list)
: 	else num-1 of tl(list)
: 	close
: end;
: 
: function conspoint (x,y);
: 	[%x,y%]
: end;
: 
: function destpoint(list) => x y;
:	hd(point) ->x;
:	hd(tl(point)) ->y;
: end;
:
: function drawline(start,finish);
: 	jumpto(destpoint(start));
: 	drawto(destpoint(finish))
: end;
: 
: function horiz(label,point) => labelset;
: 	vars x y horiz1 horiz2 horiz3;
: 	destpoint(point) ->y ->x;
: 	if	label=1
: 	then
: 		point ->horiz1;
: 		conspoint(x+5,y) ->horiz2;
: 		conspoint(x+10,y) ->horiz3;
: 	elseif	label = 2
: 	then
: 	elseif	label = 3
: 	then
: 		conspoint(x-10,y) ->horiz1;
: 		conspoint(x-5,y)  ->horiz2;
: 		point		  ->horiz3;
: 	close;
: 	drawline(horiz1,horiz3);
: 	[%horiz1,horiz2,horiz3%] -> labelset
: end;
: 
: function vert(label,point) => labelset;
: 	vars x y vert1 vert2 vert3;
: 	destpoint(point) ->y ->x;
: 	if	label=1
: 	then
: 		point ->vert1;
: 		conspoint(x,y+5) ->vert2;
: 		conspoint(x,y+10) ->vert3;
: 	elseif	label = 2
: 	then	jumpto(x,y-5); drawto(x,y+5);
: 		conspoint(x,y-5) -> vert1;
: 		point		 -> vert2;
: 		conspoint(x,y+5) -> vert3;
: 	elseif	label = 3
: 	then
: 		conspoint(x,y-10) ->vert1;
: 		conspoint(x,y-5)  ->vert2;
: 		point		  ->vert3;
: 	close;
: 	drawline(vert1,vert3);
: 	[%vert1,vert2,vert3%] -> labelset
: end;
: 
: function part(label,point) => labelset;
: 	vars x y part1 part2 vertinstance horinstance;
: 	destpoint(point) ->y ->x;
: 	if	label=1
: 	then	vert(1,point) ->vertinstance;
: 		horiz(1, 2 of vertinstance) ->horinstance;
: 	elseif	label=2
: 	then
: 		horiz(3,point) ->horinstance;
: 		vert(2, 1 of horinstance) -> vertinstance;
: 	close;
: 		1 of vertinstance -> part1;
: 		3 of horinstance  -> part2;
: 		[%part1,part2%] -> labelset
: end;
: 
: function aitch(label,point) => labelset;
: 	vars aitch1 aitch2 vertinstance partinstance;
: 	if label=1
: 	then
: 		part(1,point) ->partinstance;
: 		vert(2,2 of partinstance) ->vertinstance;
: 		point ->aitch1;
: 		destpoint(1 of vertinstance) ->y ->x;
: 		conspoint(x+5,y) ->aitch2;
: 	elseif	label=2
: 	then
: 		'cannot draw backwards'  =>  setpop()
: 	close;
: 	[%aitch1,aitch2%] ->labelset
: end;
.fi
.pg
Each rule has been implemented as a function in just the same way that
SILLYSENT handles the proceduralisation of simple generative grammars
for string languages like English. AITCH is a function with two arguments - a label and a point. Execution of

 	aitch (1, conspoint(1,1))

asserts that the labelled point aitch1 is at coordinates 
(1,1) in the turtle picture.
Each pattern morpheme is the same - one of its labelled points is associated
in the calling pattern of that function with a picture location.
Only one point from its label set need be so associated because the
others can be inferred - and a good deal of what goes on inside each
function is concerned with that inference.
Each function returns a list of points
that list being the coordinate values that all its labels have
taken on in consequence of the association of one of them with a coordinate
value when that function was called.
To appreciate what "in consequence" means, look at PART.
Saying that part1 is going to be at (1,1) say, doesn't allow us to say
where part2 will be.
That will be determined by HORIZ and VERT, in particular by the picture 
lengths of those morphemes.
When part has got back the labelsets of vert and horiz (which it assigns
to vertinstance and horinstance respectively) it is possible to compute
part2 ... which it duly does, bundling up the result into the list
labelset as the final action of the function.
.pg
Check out the operation of - aitch and of part vert and horiz for yourself -
you can compile these functions by typing
 	: LIB LETER;
.br
Make sure your Turtle picture is big enough.
Try
 	aitch (1, conspoint(1,1)) => 
 	display();

and

 	part (2, conspoint(8,12)) => 
 	display();

and the other functions too, in different picture positions.
Try tracing all the functions.
.pg
Of course we could eliminate a lot of this fuss if we utilised in part
and aitch our knowledge of the length of the strokes that horiz and vert
will draw.
But the bonus we get by doing it in this way is that we can replace
vert and horiz with new procedures that draw quite different stroke
geometries.
They might have lengths determined by some global parameter so as to
get sometimes large and sometimes small letters - useful in poster
generation.
We could have vert and horiz (and part too for that matter) paint shapes
that depart in all sorts of ways from ideal straight lines.
They could be curved or dotted, they could be several picture points
wide - regions rather than lines, they could be outlined regions, even perhaps 
images of three-dimensional blocks if you can figure out how to do that!
The only requirement is that the re-defined function should continue to be
called in the same way and should return the three-element label set needed
by the functions that call these primitives.
.pg
This  expresses in a concrete way the suggestions made in
IDENTIFIER (and in Clowes 1969) about strokes being interpretations of
ink patterns.
The definitions of HORIZ and VERT implemented here are these strokes - the
anatomy of a stroke is minimally just the labelset of its pattern
morpheme.
We can take this further - at least conceptually anyway!
- by re-examining what alternative implementation there might be for the
notion "is at" as it occurs in the grammar

 	e.g. "2 of partinstance is at 2 of vertinstance."

The sense of "is at" implemented thus far is simple minded spatial
coincidence.
If we look at handprinted capitals however we quickly
see that coincidence is far from the norm - something like 'near' would be more appropriate.
How might that be implemented?
In my paper in 'Picture Language Machines' which was concerned with
recognition I suggested how 'near' might work (see Appendix of that paper),
but more problematically I pointed to the common practice of running
one stroke smoothly into another.
The result of this elision is to combine two notionally independent pattern
morphemes into one functional unit.
'Functional unit' both in the sense of being performed - by the writer -
as a single unit, and of presenting - from the standpoint of having to recognise
the letter - a seemingly unitary pattern morpheme that actually needs to be
segmented into two or more pieces.
From a certain point of view, handwriting is largely concerned with
what happens to the relatively simple disjointed pattern morphemes of the
child's printing as the desire to write more quickly takes over.
Peter James' paper (CSL) explores this idea very largely in the context of 
recognition, but you may like to consider how you would join up a sequence of
aitches assuming that the turtle draws a line from its resting place
at the completion of one aitch to its starting point at the beginning of the
next.
The problem gets interesting when you postulate different letters
beginning and ending with upward and downward strokes in an essentially
random manner.
That is what we've actually got in handwriting and it is the poblem
to which James addressed himself.
.sh
Grammars as Recognition Devices
---------------------------------
.hs
.pg
Questions of recognition are not necessarily illuminated by the
study of strictly generative schemata.
That was indeed one of the factors that prompted Winograd
(who had studied Linguistics under Chomsky) to discard
transformational generative grammar in favour of systemic grammar.
The latter makes available to a device that is trying to grasp the
structure of a string of words - a PARSER - far more information
to guide the search than does a 'Chomskian' grammar.
.pg
We can  turn aitch into an 'aitch-recogniser' in a very
simple-minded way by replacing the funcion drawline by another
procedure that checks the turtle picture to see if there is 
a line between the pair of points presented to it. I.e.

 	function checkline (start, finish)
 	jumpto (start);
 	inspecto (finish)
 	end;

We could then SCAN the turtle picture looking for occurrence
of an aitch!
(That roughly speaking is what Marill's CYCLOPS did).
That would be like testing every contiguous substring of a string
of words to see if it was a noun phrase!
There are an awful lot of contiguous substrings in your average
sentence.
Similarly there are a lot of points in a turtle picture at
which an H could start (even if we ignore changes of size and
orientation).
We can hope to be more sensible than that at least with letters composed
of straight thin lines, by doing an initial pass with SEEPICTURE.
We can then retrieve all the END locations that it has found and use only
those as possible starting points for attempts to recognise a letter.
Indeed there are a lot of ways that we could use SEEPICTURE'
output to guide our search.
You might like to list some.
.pg
.sh
Pictorial Format
================
.hs
.pg
Narasimhan and Reddy go beyond generating letters in their paper, to use
their alphabet in a poster generator.
No details are given of the method - perhaps it isn't difficult.
The provision of a second labelled point for aitch, which is positioned
5 units to the right of the image is intended to simplify
the generation of sets of printed letters. 
Try:

 	newpicture (20, 20);
 	vars lastaitch; [EMPTY [1 1]]->lastaitch;
 	repeat 3 times
 		aitch (1, 2 of lastaitch)) ->lastaitch
 	close;
 	Display();

Clearly there's a problem about posters when one has only one letter in
the alphabet but hopefully you can see how to progress from this to
turning modest texts into these turtle pictures.
Assume that the text is given as a list so that the 3-aitch example above
would actually take the form of a call to a function posterise:

 	posterise ([h h h]);

.pg
You will need some conventions to denote separate words of your poster - perhaps they should
be sublists.
.pg
The layout of the TIFR poster in Narasimhan's paper - the decision to layout those
letters vertically is typical of the way that the meaning of the text can be
expressed in the format of the poster.
Titles of books as they appear in the frontispiece are often laid out in
ways that reflect the structure of the meaning of the title.
Writing a program to make those decisions is I'm afraid beyond
the scope of this demo!
.pg
A much simpler task that involves many of these problems is sketched in
ARITH 2.
It is that of laying out simple arithmetical problems like column addition.
Both Arithmetic and Algebra abound with examples of the exploitation of pictorial
format to encode mathematical meanings.
Place notation is of course the most primitive instance of this exploitation.
But reflect on the complexity of layout used in long division.
(See ARITH 3). A 'complete' posterise task is much easier to formulate
here.
"Add up two hundred and thirty four, one hundred and sixteen, fifty nine and
nine hundred and one."
It looks like

 	234
 	116
 	 59
      + 901
 	___
 	___

.pg
Can you see how to tackle that assuming perhaps that the translation from
English number names has been accomplished by some other code? I.e.

 	posterise ([addup [2 3 4] [1 1 6] [5 9] [9 0 1]);

.pg
Algebra uses pictorial format in at least two different ways.
The algebraic expression

 	(a + b) :- c

can be written two-dimensionally:

 	a + b
 	_____
 	  c

Here division is being expressed by a vertical ordering (the divisor is below the
dividend): addition is encoded by horizontal juxtaposition.
The horizontal line that obligatorily separates numerator and denominator
seemingly stands  for the division operator ' :- ' as well as the brackets of the
linear version of the expression.
Drawing such a line and positioning the symbols is a much more difficult
'posterise' function.
.pg
The second way in which algebra uses format conventions is in the
positioning of super-scripts and subscripts and to this positioning convention is
added a size change.
Here the exponentiation operator ** in POP11, as in

 	a ** 2

has no graphical instantiation in the two-dimensional layout.
(The limitations of our printer make it impossible to illustrate "a squared").
And that's also true of the subscript operation portrayed in POP11 through the
use of brackets i.e.

 	a(2)

.pg
The value of thinking about these arithmetical and algebraic conventions is that
we know what meanings are to be attributed to a successful recognition of the
format.
And significantly these meanings are always structured expressions.
So not only is it true that a letter can be seen to be a structure, but the
settings in which letters appear are themselves intelligible as structures too.
There seems to be a hierarchial exploitation of visual syntax at work !!!
While generating such layouts is hard, recognising them is even worse.
Anderson (CSL) pursued that problem using techniques that are very close to
those of Narasimihan's LETER but with compositions involving invisible boxes drawn
around symbols and groups of symbols.
A typical group of symbols is the numerator  a + b in the example above.
The horizontal line has to be as long as the box width of that expression.
.sh
Format and Meaning
==================
.hs
.pg
The point of citing these additional layout tasks is to try to bring home the
pervasive use that is made of pictorial FORMAT in the communication of meanings.
That this goes well beyond the alphanumeric domain is vividly
demonstrated by  Gombrich in 'Art and Illusion'.
It's also deeply implicated in what is now a classic of children's development -
Piaget's test for conservation of number.
Bryant's book is a modern if somewhat partial account of variations on that task.
See also Minsky & Papert for an insightful theoretical treatment of it.
The point of relevance to us is that the formatting conventions of the task -
two rows of beads or whatever - are taken for granted whereas perhaps what is most important is to
grasp their problematic status.
For that is what the child has to do.
