Editing Operant conditioning (section)

==Concepts and procedures==

===Origins of operant behavior: operant variability===
Operant behavior is said to be "emitted"; that is, initially it is not elicited by any particular stimulus. Thus one may ask why it happens in the first place.  The answer to this question is like Darwin's answer to the question of the origin of a "new" bodily structure, namely, variation and selection. Similarly, the behavior of an individual varies from moment to moment, in such aspects as the specific motions involved, the amount of force applied, or the timing of the response. Variations that lead to reinforcement are strengthened, and if reinforcement is consistent, the behavior tends to remain stable. However, behavioral variability can itself be altered through the manipulation of certain variables.<ref>{{cite journal|last1=Neuringer|first1=A|year=2002|title=Operant variability: Evidence, functions, and theory|journal=Psychonomic Bulletin & Review|volume=9|issue=4|pages=672–705|doi=10.3758/bf03196324|pmid=12613672|doi-access=free}}</ref>

===Modifying operant behavior: reinforcement and punishment===
{{Main|Reinforcement|Punishment (psychology)}}
Reinforcement and punishment are the core tools through which operant behavior is modified. These terms are defined by their effect on behavior. "Positive" and "negative" refer to whether a stimulus was added or removed, respectively. Similarly, "reinforcement" and "punishment" refer to the future frequency of the behavior. Reinforcement describes a consequence that makes a behavior occur more often in the future, whereas punishment is a consequence that makes a behavior occur less often. <ref>{{cite book |last1=Cooper |first1=JO |last2=Heron |first2=TE |last3=Heward |first3=WL |title=Applied Behavior Analysis |date=2019 |publisher=Pearson Education (US) |isbn=978-0134752556 |pages=33 |edition=3rd}}</ref>  

There are a total of four consequences:

# '''[[Positive reinforcement]]''' occurs when a behavior (response) results in a desired stimulus being added and increases the frequency of that behavior in the future.<ref name="Schultz">{{cite journal|year=2015|title=Neuronal reward and decision signals: from theories to data|journal=Physiological Reviews|volume=95|issue=3|pages=853–951|doi=10.1152/physrev.00023.2014|pmc=4491543|pmid=26109341|quote=Rewards in operant conditioning are positive reinforcers.&nbsp;... Operant behavior gives a good definition for rewards. Anything that makes an individual come back for more is a positive reinforcer and therefore a reward. Although it provides a good definition, positive reinforcement is only one of several reward functions.&nbsp;... Rewards are attractive. They are motivating and make us exert an effort.&nbsp;... Rewards induce approach behavior, also called appetitive or preparatory behavior, and consummatory behavior.&nbsp;... Thus any stimulus, object, event, activity, or situation that has the potential to make us approach and consume it is by definition a reward.|vauthors=Schultz W}}</ref> '''Example''': if a rat in a [[Skinner box]] gets food when it presses a lever, its rate of pressing will go up. Pressing the lever was positively reinforced.
# '''[[Negative reinforcement]]''' (a.k.a. escape) occurs when a behavior (response) is followed by the removal of an [[aversive]] stimulus, thereby increasing the original behavior's frequency. '''Example''': A child is afraid of loud noises at a fireworks display. They put on a pair of headphones, and they can no longer hear the fireworks. The next time the child sees fireworks, they put on a pair of headphones. Putting on headphones was negatively reinforced.
# '''[[Positive punishment]]''' (also referred to as "punishment by contingent stimulation") occurs when a behavior (response) is followed by an aversive stimulus which makes the behavior less likely to occur in the future. '''Example:''' A child touches a hot stove and burns his hand. The next time he sees a stove, he does not touch it. Touching the stove was positively punished.
# '''[[Negative punishment]]''' (penalty) (also called "punishment by contingent withdrawal") occurs when a behavior (response) is followed by the removal of a stimulus, and the behavior is less likely to occur in the future. '''Example''': When an employee puts their lunch in a communal refrigerator, it gets stolen before break time. The next time the employee brings a lunch to work, they do not put it in the refrigerator. Putting the lunch in the refrigerator was negatively punished.

* '''Extinction''' is a consequence strategy that occurs when a previously reinforced behavior is no longer reinforced with either positive or negative reinforcement. During extinction the behavior becomes less probable. Occasional reinforcement can lead to an even longer delay before behavior extinction due to the learning factor of repeated instances becoming necessary to get reinforcement, when compared with reinforcement being given at each opportunity before extinction.<ref>{{cite book |last1=Skinner |first1=B.F. |title=Science and Human Behavior |date=2014 |publisher=The B.F. Skinner Foundation |location=Cambridge, MA |page=70 |url=http://www.bfskinner.org/newtestsite/wp-content/uploads/2014/02/ScienceHumanBehavior.pdf |access-date=13 March 2019}}</ref> 

A study suggests that tactile feedback, such as haptic vibrations from mobile devices, can function as secondary reinforcers (i.e., learned rewards that acquire reinforcing value through association), strengthening consumer behaviors such as online purchasing.<ref>Hampton, W., & Morrin, M. (2025). "When Touch Drives Purchase: Haptic Rewards as Reinforcers of Online Buying." Journal of Consumer Research. https://doi.org/10.1093/jcr/ucaf025</ref>

====Schedules of reinforcement====
Schedules of reinforcement are rules that control the delivery of reinforcement. The rules specify either the time that reinforcement is to be made available, or the number of responses to be made, or both. Many rules are possible, but the following are the most basic and commonly used<ref>Schacter et al.2011 Psychology 2nd ed. pg.280–284 Reference for entire section Principles version 130317</ref><ref name="ReferenceA"/>

* Fixed interval schedule: Reinforcement occurs following the first response after a fixed time has elapsed after the previous reinforcement. This schedule yields a "break-run" pattern of response; that is, after training on this schedule, the organism typically pauses after reinforcement, and then begins to respond rapidly as the time for the next reinforcement approaches.
* Variable interval schedule: Reinforcement occurs following the first response after a variable time has elapsed from the previous reinforcement. This schedule typically yields a relatively steady rate of response that varies with the average time between reinforcements.
* Fixed ratio schedule: Reinforcement occurs after a fixed number of responses have been emitted since the previous reinforcement. An organism trained on this schedule typically pauses for a while after a reinforcement and then responds at a high rate. If the response requirement is low there may be no pause; if the response requirement is high the organism may quit responding altogether.
* Variable ratio schedule: Reinforcement occurs after a variable number of responses have been emitted since the previous reinforcement. This schedule typically yields a very high, persistent rate of response.
* Continuous reinforcement: Reinforcement occurs after each response. Organisms typically respond as rapidly as they can, given the time taken to obtain and consume reinforcement, until they are satiated.

====Factors that alter the effectiveness of reinforcement and punishment====
The effectiveness of reinforcement and punishment can be changed. 
# '''Satiation/Deprivation''': The effectiveness of a positive or "appetitive" stimulus will be reduced if the individual has received enough of that stimulus to satisfy his/her appetite. The opposite effect will occur if the individual becomes deprived of that stimulus: the effectiveness of a consequence will then increase. A subject with a full stomach wouldn't feel as motivated as a hungry one.<ref name = Miltenberger84>Miltenberger, R. G. "Behavioral Modification: Principles and Procedures". [[Thomson/Wadsworth]], 2008. p. 84.</ref>
# '''Immediacy''': An immediate consequence is more effective than a delayed one. If one gives a dog a treat for sitting within five seconds, the dog will learn faster than if the treat is given after thirty seconds.<ref>Miltenberger, R. G. "Behavioral Modification: Principles and Procedures". [[Thomson/Wadsworth]], 2008. p. 86.</ref>
# '''Contingency''': To be most effective, reinforcement should occur consistently after responses and not at other times. Learning may be slower if reinforcement is intermittent, that is, following only some instances of the same response. Responses reinforced intermittently are usually slower to extinguish than are responses that have always been reinforced.<ref name = Miltenberger84/>
# '''Size''': The size, or amount, of a stimulus often affects its potency as a reinforcer. Humans and animals engage in cost-benefit analysis. If a lever press brings ten food pellets, lever pressing may be learned more rapidly than if a press brings only one pellet.   A pile of quarters from a slot machine may keep a gambler pulling the lever longer than a single quarter.

Most of these factors serve biological functions.  For example, the process of satiation helps the organism maintain a stable internal environment ([[homeostasis]]). When an organism has been deprived of sugar, for example, the taste of sugar is an effective reinforcer. When the organism's [[blood sugar]] reaches or exceeds an optimum level the taste of sugar becomes less effective or even aversive.

====Shaping====
{{main|Shaping (psychology)}}
Shaping is a conditioning method often used in animal training and in teaching nonverbal humans. It depends on operant variability and reinforcement, as described above. The trainer starts by identifying the desired final (or "target") behavior. Next, the trainer chooses a behavior that the animal or person already emits with some probability. The form of this behavior is then gradually changed across successive trials by reinforcing behaviors that approximate the target behavior more and more closely. When the target behavior is finally emitted, it may be strengthened and maintained by the use of a schedule of reinforcement.

====Noncontingent reinforcement====
Noncontingent reinforcement is the delivery of reinforcing stimuli regardless of the organism's behavior. Noncontingent reinforcement may be used in an attempt to reduce an undesired target behavior by reinforcing multiple alternative responses while extinguishing the target response.<ref>{{cite journal|last1=Tucker|first1=M.|last2=Sigafoos|first2=J.|last3=Bushell|first3=H.|year=1998|title=Use of noncontingent reinforcement in the treatment of challenging behavior|journal=Behavior Modification|volume=22|issue=4|pages=529–547|doi=10.1177/01454455980224005|pmid=9755650|s2cid=21542125}}</ref> As no measured behavior is identified as being strengthened, there is controversy surrounding the use of the term noncontingent "reinforcement".<ref>{{cite journal|last1=Poling|first1=A.|last2=Normand|first2=M.|year=1999|title=Noncontingent reinforcement: an inappropriate description of time-based schedules that reduce behavior|journal=Journal of Applied Behavior Analysis|volume=32|issue=2|pages=237–238|doi=10.1901/jaba.1999.32-237|pmc=1284187}}</ref>

===Stimulus control of operant behavior===
{{main|Stimulus control}}
Though initially operant behavior is emitted without an identified reference to a particular stimulus, during operant conditioning operants come under the control of stimuli that are present when behavior is reinforced. Such stimuli are called "discriminative stimuli." A so-called "[[three-term contingency]]" is the result. That is, discriminative stimuli set the occasion for responses that produce reward or punishment. Example: a rat may be trained to press a lever only when a light comes on; a dog rushes to the kitchen when it hears the rattle of his/her food bag; a child reaches for candy when s/he sees it on a table.

====Discrimination, generalization & context====
Most behavior is under stimulus control. Several aspects of this may be distinguished: 
*'''Discrimination''' typically occurs when a response is reinforced only in the presence of a specific stimulus. For example, a pigeon might be fed for pecking at a red light and not at a green light; in consequence, it pecks at red and stops pecking at green.  Many complex combinations of stimuli and other conditions have been studied; for example an organism might be reinforced on an interval schedule in the presence of one stimulus and on a ratio schedule in the presence of another. 
*'''Generalization''' is the tendency to respond to stimuli that are similar to a previously trained discriminative stimulus. For example, having been trained to peck at "red" a pigeon might also peck at "pink", though usually less strongly.
*'''Context''' refers to stimuli that are continuously present in a situation, like the walls, tables, chairs, etc. in a room, or the interior of an operant conditioning chamber. Context stimuli may come to control behavior as do discriminative stimuli, though usually more weakly.  Behaviors learned in one context may be absent, or altered, in another.  This may cause difficulties for behavioral therapy, because behaviors learned in the therapeutic setting may fail to occur in other situations.

===Behavioral sequences: conditioned reinforcement and chaining===
Most behavior cannot easily be described in terms of individual responses reinforced one by one. The scope of operant analysis is expanded through the idea of behavioral chains, which are sequences of responses bound together by the three-term contingencies defined above.  Chaining is based on the fact, experimentally demonstrated, that a discriminative stimulus not only sets the occasion for subsequent behavior, but it can also reinforce a behavior that precedes it. That is, a discriminative stimulus is also a "conditioned reinforcer". For example, the light that sets the occasion for lever pressing may be used to reinforce "turning around" in the presence of a noise. This results in the sequence "noise – turn-around – light – press lever – food". Much longer chains can be built by adding more stimuli and responses.

===Escape and avoidance===
In escape learning, a behavior terminates an (aversive) stimulus. For example, shielding one's eyes from sunlight terminates the (aversive) stimulation of bright light in one's eyes.  (This is an example of negative reinforcement, defined above.) Behavior that is maintained by preventing a stimulus is called "avoidance,"  as, for example, putting on sun glasses before going outdoors.  Avoidance behavior raises the so-called "avoidance paradox", for, it may be asked, how can the non-occurrence of a stimulus serve as a reinforcer? This question is addressed by several theories of avoidance (see below).

Two kinds of experimental settings are commonly used: discriminated and free-operant avoidance learning.

====Discriminated avoidance learning====
A discriminated avoidance experiment involves a series of trials in which a neutral stimulus such as a light is followed by an aversive stimulus such as a shock. After the neutral stimulus appears an operant response such as a lever press prevents or terminate the aversive stimulus. In early trials, the subject does not make the response until the aversive stimulus has come on, so these early trials are called "escape" trials. As learning progresses, the subject begins to respond during the neutral stimulus and thus prevents the aversive stimulus from occurring. Such trials are called "avoidance trials." This experiment is said to involve classical conditioning because a neutral CS (conditioned stimulus) is paired with the aversive US (unconditioned stimulus); this idea underlies the two-factor theory of avoidance learning described below.

====Free-operant avoidance learning====
In free-operant avoidance a subject periodically receives an aversive stimulus (often an electric shock) unless an operant response is made; the response delays the onset of the shock. In this situation, unlike discriminated avoidance, no prior stimulus signals the shock. Two crucial time intervals determine the rate of avoidance learning. This first is the S-S (shock-shock) interval. This is time between successive shocks in the absence of a response. The second interval is the R-S (response-shock) interval. This specifies the time by which an operant response delays the onset of the next shock. Each time the subject performs the operant response, the R-S interval without shock begins anew.

====Two-process theory of avoidance====

This theory was originally proposed in order to explain discriminated avoidance learning, in which an organism learns to avoid an aversive stimulus by escaping from a signal for that stimulus. Two processes are involved: classical conditioning of the signal followed by operant conditioning of the escape response:

a) ''Classical conditioning of fear.'' Initially the organism experiences the pairing of a CS with an aversive US. The theory assumes that this pairing creates an association between the CS and the US through classical conditioning and, because of the aversive nature of the US, the CS comes to elicit a conditioned emotional reaction (CER) – "fear." b) ''Reinforcement of the operant response by fear-reduction.'' As a result of the first process, the CS now signals fear; this unpleasant emotional reaction serves to motivate operant responses, and responses that terminate the CS are reinforced by fear termination. The theory does not say that the organism "avoids" the US in the sense of anticipating it, but rather that the organism "escapes" an aversive internal state that is caused by the CS.
Several experimental findings seem to run counter to two-factor theory. For example, avoidance behavior often extinguishes very slowly even when the initial CS-US pairing never occurs again, so the fear response might be expected to extinguish (see [[Classical conditioning]]). Further, animals that have learned to avoid often show little evidence of fear, suggesting that escape from fear is not necessary to maintain avoidance behavior.<ref name="Pierce 2004">Pierce & Cheney (2004) Behavior Analysis and Learning</ref>

====Operant or "one-factor" theory====
Some theorists suggest that avoidance behavior may simply be a special case of operant behavior maintained by its consequences. In this view the idea of "consequences" is expanded to include sensitivity to a pattern of events. Thus, in avoidance, the consequence of a response is a reduction in the rate of aversive stimulation. Indeed, experimental evidence suggests that a "missed shock" is detected as a stimulus, and can act as a reinforcer. Cognitive theories of avoidance take this idea a step farther. For example, a rat comes to "expect" shock if it fails to press a lever and to "expect no shock" if it presses it, and avoidance behavior is strengthened if these expectancies are confirmed.<ref name="Pierce 2004"/>

===Operant hoarding===
Operant hoarding refers to the observation that rats reinforced in a certain way may allow food pellets to accumulate in a food tray instead of retrieving those pellets. In this procedure, retrieval of the pellets always instituted a one-minute period of [[Extinction (psychology)|extinction]] during which no additional food pellets were available but those that had been accumulated earlier could be consumed. This finding appears to contradict the usual finding that rats behave impulsively in situations in which there is a choice between a smaller food object right away and a larger food object after some delay. See [[schedules of reinforcement]].<ref>{{cite journal|last1=Cole|first1=M.R.|year=1990|title=Operant hoarding: A new paradigm for the study of self-control|journal=Journal of the Experimental Analysis of Behavior|volume=53|issue=2|pages=247–262|doi=10.1901/jeab.1990.53-247|pmid=2324665|pmc=1323010}}</ref>