A potent means of reinforcing and shaping voluntary behavior.
You may have heard of Pavlov's dog - a dog that was trained to drool at the sound of a bell. Pavlov's dog is an example of classical conditioning, where the trained behavior is automatic. That is, the dog automatically drools at the sight of food, and in this experiment that automatic behavior was transferred to the sound of a bell.
With operant conditioning, we're dealing with voluntary behavior. Operant conditioning involves providing reward structures through reinforcement schedules in order to elicit specific controlled behaviors.
Operant conditioning is ubiquitous in our society, and it can be incredibly powerful. Some would say dangerously so -- addictions such as gambling addiction are due in part to the nature of such conditioning. But operant conditioning is also used extensively in games (in fact, it may be all that's necessary), and is what makes many websites, such as Reddit, so pleasurable (maybe even addictive!). After becoming familiar with it, you'll notice it how pervasive it is, underlying many of the systems we enjoy using so much.
Reinforcement vs Punishment
Reinforcement is anything that encourages a behavior. Anything that reinforces is a reinforcer. Punishment, on the other hand, is anything that discourages a behavior. Anything that punishes is a punisher.
Both reinforcers and punishers can be divided into positive and negative. Positive just implies that something is being added or introduced; negative implies that something is being removed.
The Skinner Box
The Skinner Box is the prototypical form of operant conditioning. You have a rat in a box with a lever. The target behavior for reinforcing is lever pressing, and the reinforcement reward is a food pellet.
In general, the rat presses the lever, and is rewarded with a food pellet. The exact reward conditions depend on the reinforcement schedule, which are described below.
In operant conditioning, there are four basic patterns of reinforcement, known collectively as reinforcement schedules. Each reinforcement schedule has different impacts on response frequency; some are more effective than others.
Reinforcement schedules can vary in two ways:
Interval vs. Ratio
Interval - Reinforcement is applied at regular intervals (that is, after certain amount of time have passed). For example, reinforcement is applied every 10 minutes. The size of this interval is negatively related to response rate - larger intervals mean lower response rates. However, the magnitude of the reward is positively related; bigger, better rewards mean higher response rates.
Ratio - Reinforcement is applied according to the subject's responses (that is, after a certain amount of responses). For example, reinforcement is applied every 20 responses. Ratio reinforcement can be very effective, but it is limited by fatigue - and thus limited by the size of the required ratio.
Fixed vs Variable
Fixed - Reinforcement will occur reliably when an interval or ratio is satisfied. For example, the reinforcement is applied every 20 responses exactly.
Variable - Reinforcement occurs only at an average of the interval or ratio. For example, the reinforcement is applied, on average, every 20 responses - it could happen at 18 responses, or 22 responses, and so on. This level of unpredicatbility makes variable reinforcement very powerful, as we'll see later.
From these we have four combinations of basic reinforcement schedules: fixed interval, fixed ratio, variable interval, and variable ratio. Their effectiveness in eliciting response rate is shown in the accompanying graph.