Patrick Stevens bio photo

Patrick Stevens

Former mathematics student at the University of Cambridge; now a software engineer.

Email Twitter Github Stackoverflow

The book Don’t Shoot the Dog, by Karen Pryor, contains a simple exercise in demonstrating clicker training. This is a very successful technique used to produce behaviour in animals: having first associated the sound of a click with the reward of attention or food, one can then use the click as an immediate substitute for the reward (so that one can train more complicated, time-critical actions through positive reinforcement; a click is instant, but food or attention requires the trainer approaching the trainee). The demonstration exercise involves a person designated the Trainer, and a person designated the Trainee. The trainer has a goal in mind, but cannot communicate that goal to the trainee; the only interaction allowed is a click when the trainee is doing something vaguely correct. As an example, the trainee can be made to move towards a light switch by dint of a click when ey is pointing towards the switch, then a click when ey moves in that direction (ignoring any attempts to move in a different direction); the trainer then draws attention to the general area of the light by clicking whenever the trainee looks in the right direction, and then for any hand movement, then for hand movement in the direction of the light switch. This kind of incremental reinforcement can be used to achieve all sorts of interesting behaviour. (I seem to remember, from Don’t Shoot the Dog, that it has been used in chickens to make them do hundred-step dances, although I may have mis-remembered that.)

The exercise, then, demonstrates the power of reinforcement to produce order from chaos. With one trainer and several trainees, I would imagine that the problem becomes harder, but not insurmountably so (click when the person whose attention you need moves - it would take a while, but eventually I think I could train individual behaviour out of the group).

But what about one trainee and several trainers? Imagine a scenario in which a single trainee is in a room alone, with the clicks of two trainers coming through the door in such a way that the trainee can hear only a single click. No matter which of the trainers produced it, the trainee can’t tell the difference between different trainers’ commands. The two trainers have competing goals (or the same goals?), and they perform the above clicker-training procedure. Would any useful behaviour result? I can imagine that an animal would get hopelessly confused by the competing goals, but a human might be able to get some kind of result. (We must assume in the contradictory case that the trainers have among their goals that “progress towards the opposing goal should be minimised”; that prevents them from teaming up to, say, perform the two goals sequentially.)

Imagine that one trainer aims to make the trainee do the Macarena, while the other trainer wishes the trainee to assume the lotus position. The goals are contradictory. I would imagine that the trainee would receive reinforcement towards being low down (in order to sit), as well as for standing straight and still (the starting position for the Macarena). I suspect that the trainee would infer some completely unrelated behaviour. I don’t know if there’s an official name for “excessively powerful inference” - pareidolia (the tendency to see faces in random settings) is a related phenomenon, and might cover this. I would be interested to know what behaviour would result from this kind of stimulus. Perhaps an experiment is in order (or, if you are also interested, do convey your results to me).