Editing Learning classifier system (section)

== Methodology ==
The architecture and components of a given learning classifier system can be quite variable. It is useful to think of an LCS as a machine consisting of several interacting components. Components may be added or removed, or existing components modified/exchanged to suit the demands of a given problem domain (like algorithmic building blocks) or to make the algorithm flexible enough to function in many different problem domains. As a result, the LCS paradigm can be flexibly applied to many problem domains that call for [[machine learning]]. The major divisions among LCS implementations are as follows: (1) Michigan-style architecture vs. Pittsburgh-style architecture,<ref>[http://ryanurbanowicz.com/wp-content/uploads/2016/09/Urbanowicz_Browne_2015_Introducing-Rule-Based-Machine-Learning-A-Practical-Guide-GECCO15-CRC-Copy.pdf Introducing Rule-Based Machine Learning:  A Practical Guide], Ryan J. Urbanowicz and Will Browne, see pp. 72-73 for Michigan-style architecture vs. Pittsburgh-style architecture.</ref> (2) [[reinforcement learning]] vs. [[supervised learning]], (3) incremental learning vs. batch learning, (4) [[Online machine learning|online learning]] vs. [[offline learning]], (5) strength-based fitness vs. accuracy-based fitness, and (6) complete action mapping vs best action mapping. These divisions are not necessarily mutually exclusive. For example, XCS,<ref name=":10" /> the best known and best studied LCS algorithm, is Michigan-style, was designed for reinforcement learning but can also perform supervised learning, applies incremental learning that can be either online or offline, applies accuracy-based fitness, and seeks to generate a complete action mapping.

=== Elements of a generic LCS algorithm ===
[[File:Generic Michigan-style Supervised LCS Schematic.png|thumb|A step-wise schematic illustrating a generic Michigan-style learning classifier system learning cycle performing supervised learning]]
Keeping in mind that LCS is a paradigm for genetic-based machine learning rather than a specific method, the following outlines key elements of a generic, modern (i.e. post-XCS) LCS algorithm. For simplicity let us focus on Michigan-style architecture with supervised learning. See the illustrations  on the right laying out the sequential steps involved in this type of generic LCS.

==== Environment ====
The environment is the source of data upon which an LCS learns. It can be an offline, finite [[Training data set|training dataset]] (characteristic of a [[data mining]], [[Statistical classification|classification]], or regression problem), or an online sequential stream of live training instances. Each training instance is assumed to include some number of ''features'' (also referred to as ''attributes'', or [[Dependent and independent variables|''independent variables'']]), and a single ''endpoint'' of interest (also referred to as the [[Class (set theory)|class]], ''action'', ''[[phenotype]]'', ''prediction'', or [[Dependent and independent variables|''dependent variable'']]). Part of LCS learning can involve [[feature selection]], therefore not all of the features in the training data need to be informative. The set of feature values of an instance is commonly referred to as the ''state''. For simplicity let's assume an example problem domain with [[Boolean data type|Boolean]]/[[Binary number|binary]] features and a [[Boolean data type|Boolean]]/[[Binary number|binary]] class. For Michigan-style systems, one instance from the environment is trained on each learning cycle (i.e. incremental learning). Pittsburgh-style systems perform batch learning, where rule sets are evaluated in each iteration over much or all of the training data.

==== Rule/classifier/population ====
A rule is a context dependent relationship between state values and some prediction. Rules typically take the form of an {IF:THEN} expression, (e.g.  {''IF 'condition' THEN 'action'},'' or as a more specific example, ''{IF 'red' AND 'octagon' THEN 'stop-sign'}''). A critical concept in LCS and rule-based machine learning alike, is that an individual rule is not in itself a model, since the rule is only applicable when its condition is satisfied. Think of a rule as a "local-model" of the solution space.

Rules can be represented in many different ways to handle different data types (e.g. binary, discrete-valued, ordinal, continuous-valued). Given binary data LCS traditionally applies a ternary rule representation (i.e. rules can include either a 0, 1, or '#' for each feature in the data). The 'don't care' symbol (i.e. '#') serves as a wild card within a rule's condition allowing rules, and the system as a whole to generalize relationships between features and the target endpoint to be predicted. Consider the following rule (#1###0 ~ 1) (i.e. condition ~ action).  This rule can be interpreted as: IF the second feature = 1 AND the sixth feature = 0 THEN the class prediction = 1. We would say that the second and sixth features were specified in this rule, while the others were generalized. This rule, and the corresponding prediction are only applicable to an instance when the condition of the rule is satisfied by the instance. This is more commonly referred to as matching. In Michigan-style LCS, each rule has its own fitness, as well as a number of other rule-parameters associated with it that can describe the number of copies of that rule that exist (i.e. the ''numerosity''), the age of the rule, its accuracy, or the accuracy of its reward predictions, and other descriptive or experiential statistics.  A rule along with its parameters is often referred to as a ''classifier''.  In Michigan-style systems, classifiers are contained within a ''population'' [P] that has a user defined maximum number of classifiers. Unlike most [[stochastic]] search algorithms (e.g. [[evolutionary algorithm]]s), LCS populations start out empty (i.e. there is no need to randomly initialize a rule population). Classifiers will instead be initially introduced to the population with a covering mechanism.

In any LCS, the trained model is a set of rules/classifiers, rather than any single rule/classifier.  In Michigan-style LCS, the entire trained (and optionally, compacted) classifier population forms the prediction model.

==== Matching ====
One of the most critical and often time-consuming elements of an LCS is the matching process. The first step in an LCS learning cycle takes a single training instance from the environment and passes it to [P] where matching takes place. In step two, every rule in [P] is now compared to the training instance to see which rules match (i.e. are contextually relevant to the current instance). In step three, any matching rules are moved to a ''match set'' [M]. A rule matches a training instance if all feature values specified in the rule condition are equivalent to the corresponding feature value in the training instance. For example, assuming the training instance is (001001 ~ 0), these rules would match: (###0## ~ 0), (00###1 ~ 0), (#01001 ~ 1), but these rules would not (1##### ~ 0), (000##1 ~ 0), (#0#1#0 ~ 1). Notice that in matching, the endpoint/action specified by the rule is not taken into consideration. As a result, the match set may contain classifiers that propose conflicting actions.  In the fourth step, since we are performing supervised learning, [M] is divided into a correct set [C] and an incorrect set [I]. A matching rule goes into the correct set if it proposes the correct action (based on the known action of the training instance), otherwise it goes into [I]. In reinforcement learning LCS, an action set [A] would be formed here instead, since the correct action is not known.

==== Covering ====
At this point in the learning cycle, if no classifiers made it into either [M] or [C] (as would be the case when the population starts off empty), the covering mechanism is applied (fifth step). Covering is a form of ''online smart population initialization''. Covering randomly generates a rule that matches the current training instance (and in the case of supervised learning, that rule is also generated with the correct action. Assuming the training instance is (001001 ~ 0), covering might generate any of the following rules:  (#0#0## ~ 0), (001001 ~ 0), (#010## ~ 0). Covering not only ensures that each learning cycle there is at least one correct, matching rule in [C], but that any rule initialized into the population will match at least one training instance. This prevents LCS from exploring the search space of rules that do not match any training instances.

==== Parameter updates/credit assignment/learning ====
In the sixth step, the rule parameters of any rule in [M] are updated to reflect the new experience gained from the current training instance. Depending on the LCS algorithm, a number of updates can take place at this step. For supervised learning, we can simply update the accuracy/error of a rule. Rule accuracy/error is different than model accuracy/error, since it is not calculated over the entire training data, but only over all instances that it matched. Rule accuracy is calculated by dividing the number of times the rule was in a correct set [C] by the number of times it was in a match set [M]. Rule accuracy can be thought of as a 'local accuracy'.  Rule fitness is also updated here, and is commonly calculated as a function of rule accuracy. The concept of fitness is taken directly from classic [[genetic algorithm]]s. Be aware that there are many variations on how LCS updates parameters in order to perform credit assignment and learning.

==== Subsumption ====
In the seventh step, a ''subsumption'' mechanism is typically applied. Subsumption is an explicit generalization mechanism that merges classifiers that cover redundant parts of the problem space.  The subsuming classifier effectively absorbs the subsumed classifier (and has its numerosity increased). This can only happen when the subsuming classifier is more general, just as accurate, and covers all of the problem space of the classifier it subsumes.

==== Rule discovery/genetic algorithm ====
In the eighth step, LCS adopts a highly elitist [[genetic algorithm]] (GA) which will select two parent classifiers based on fitness (survival of the fittest).  Parents are selected from [C] typically using [[tournament selection]]. Some systems have applied [[roulette wheel selection]] or deterministic selection, and have differently selected parent rules from either [P] - panmictic selection, or from [M]). [[Crossover (genetic algorithm)|Crossover]] and [[Mutation (genetic algorithm)|mutation]] operators are now applied to generate two new offspring rules. At this point, both the parent and offspring rules are returned to [P]. The LCS [[genetic algorithm]] is highly elitist since each learning iteration, the vast majority of the population is preserved. Rule discovery may alternatively be performed by some other method, such as an [[estimation of distribution algorithm]], but a GA is by far the most common approach. Evolutionary algorithms like the GA employ a stochastic search, which makes LCS a stochastic algorithm. LCS seeks to cleverly explore the search space, but does not perform an exhaustive search of rule combinations, and is not guaranteed to converge on an optimal solution.

==== Deletion ====
The last step in a generic LCS learning cycle is to maintain the maximum population size. The deletion mechanism will select classifiers for deletion (commonly using roulette wheel selection). The probability of a classifier being selected for deletion is inversely proportional to its fitness. When a classifier is selected for deletion, its numerosity parameter is reduced by one. When the numerosity of a classifier is reduced to zero, it is removed entirely from the population.

==== Training ====
LCS will cycle through these steps repeatedly for some user defined number of training iterations, or until some user defined termination criteria have been met. For online learning, LCS will obtain a completely new training instance each iteration from the environment. For offline learning, LCS will iterate through a finite training dataset. Once it reaches the last instance in the dataset, it will go back to the first instance and cycle through the dataset again.

==== Rule compaction ====
Once training is complete, the rule population will inevitably contain some poor, redundant and inexperienced rules. It is common to apply a ''rule compaction'', or ''condensation'' heuristic as a post-processing step. This resulting compacted rule population is ready to be applied as a prediction model (e.g. make predictions on testing instances), and/or to be interpreted for [[knowledge discovery]].

==== Prediction ====
Whether or not rule compaction has been applied, the output of an LCS algorithm is a population of classifiers which can be applied to making predictions on previously unseen instances. The prediction mechanism is not part of the supervised LCS learning cycle itself, however it would play an important role in a reinforcement learning LCS learning cycle. For now we consider how the prediction mechanism can be applied for making predictions to test data. When making predictions, the LCS learning components are deactivated so that the population does not continue to learn from incoming testing data. A test instance is passed to [P] where a match set [M] is formed as usual. At this point the match set is differently passed to a prediction array. Rules in the match set can predict different actions, therefore a voting scheme is applied.  In a simple voting scheme, the action with the strongest supporting 'votes' from matching rules wins, and becomes the selected prediction. All rules do not get an equal vote. Rather the strength of the vote for a single rule is commonly proportional to its numerosity and fitness. This voting scheme and the nature of how LCS's store knowledge, suggests that LCS algorithms are implicitly ''ensemble learners''.

==== Interpretation ====
Individual LCS rules are typically human readable IF:THEN expression. Rules that constitute the LCS prediction model can be ranked by different rule parameters and manually inspected. Global strategies to guide knowledge discovery using statistical and graphical have also been proposed.<ref name=":11" /><ref name=":12" /> With respect to other advanced machine learning approaches, such as [[artificial neural network]]s, [[random forest]]s, or [[genetic programming]], learning classifier systems are particularly well suited to problems that require interpretable solutions.