|
Positive
reinforcement
Reinforcement
can be positive or negative.
Both types of reinforcement make a response more likely in the
future. Reinforcement
schedules can be constant or intermittent and
intermittent reinforcement can be delivered at either invariable or
variable rates.
Continuous
reinforcement
A reward is delivered after each response. Animals learn fastest
when they are reinforced after every correct response. This is known
as a continuous reinforcement schedule.
Fixed Ratio (represented by the initials FR)
Associations can still be established with less regular rewards or
higher schedules of reinforcement. This is hardly surprising since
wild animals may need to perform a given behaviour several times to
acquire food. It is this persistence that trainers rely upon when
they have established an association and wean their animals off
continuous reinforcement. A reward is delivered after a fixed number
of responses. A common code used in learning protocols is FR5, which
means the reward is delivered immediately after the fifth response.
On a fixed reinforcement schedule the animal often learns to predict
the pattern of food delivery and therefore appears to lose interest
by slowing down its response to stimuli after each reward while
beginning to pay keen attention only as it gets near to the end of
the fixed ratio (in this case, the fifth iteration of the response).
To avoid these peaks and troughs in responsiveness, variable ratio
schedules can be used.
Variable Ratio
(represented by the initials VR)
A reward can be delivered after a random, variable number of
responses. For example VR5 means although the rewards are sometimes
delivered after ten, sometimes after twenty and sometimes after one
response and so on, the average number of responses required for a
reward is five. After a response has been established (this being
achieved quickest on a continuous reinforcement schedule) many
trainers adopt a variable ratio schedule in the knowledge that it is
often very difficult to reward responses every time they occur
especially if they form part of public displays, competitions or if
they have to occur at some distance from the trainer. Trained
behaviours learned on a variable reinforcement schedule are the most
persistent and they are slower to extinguish than those resulting
from fixed and continuous schedules. This is because during training
on a variable ratio, many responses may have had no consequences and
persistence is more likely to be rewarded. Dogs that sometimes
get titbits for begging at tables take longer to give up when owners
learnt to never reward the behaviour than those that have had
constant reinforcement.
Fixed Interval (FI)
A reward is delivered for the first response that occurs after a
fixed interval of time has passed since the last reward. For example
FI5 means the reward is delivered for the first response after five
seconds has passed since the last reward.
Variable Interval (VI)
A reward is delivered for the first response after a time interval
since the last reward. The interval varies on a random basis but
averages out to a particular value. VI5 means that these time
intervals average out at five seconds and would range between zero
and ten seconds.
There is one other schedule worthy of mention, the differential
reinforcement of other behaviours. This is a schedule in which a
trainer chooses one behaviour that will not be reinforced. Instead
the trainer reinforces a variety of other behaviours. Predictably,
this approach causes the non-reinforced behaviour to drop out and is
often used to change problem behaviours. While this schedule
withholds reinforcement of the problem behaviour it still allows
reinforcement to be delivered. Withdrawing reinforcement completely
may not always be advisable, as there is a danger of removing all
incentives to respond in any way. Just as it is important to avoid
confusion and promote creativity when training a new behaviour, it
is imperative that when training an animal to stop performing a
problem behaviour, it is simultaneously given the opportunity to
perform a more acceptable behaviour with a similar motivation. A dog
that chases joggers can most easily be trained to stop and look at
the handler if it associates the sight of a jogger with an
owner-centred ball game into which it can channel its motivation to
chase.
Partial reinforcement effect
The term partial reinforcement effect refers to both the increase in
performance under partial reinforcement schedules and the increased
resistance to extinction of responses that these produce when
compared with continuous reinforcement. Responses can be made highly
resistant to extinction by training up to very high partial
reinforcement schedules. Conversely, if trainers want to extinguish
a response, they do well to start by determining the schedule of
reinforcement used to establish the response. |