Editing Multidimensional scaling (section)

==Procedure==
There are several steps in conducting MDS research:
# '''Formulating the problem''' – What variables do you want to compare?  How many variables do you want to compare? What purpose is the study to be used for?
# '''Obtaining input data''' – For example, :- Respondents are asked a series of questions. For each product pair, they are asked to rate similarity (usually on a 7-point [[Likert scale]] from very similar to very dissimilar). The first question could be for Coke/Pepsi for example, the next for Coke/Hires rootbeer, the next for Pepsi/Dr Pepper, the next for Dr Pepper/Hires rootbeer, etc. The number of questions is a function of the number of brands and can be calculated as <math>Q = N (N - 1) / 2</math> where ''Q'' is the number of questions and ''N'' is the number of brands. This approach is referred to as the “Perception data : direct approach”. There are two other approaches. There is the “Perception data : derived approach” in which products are decomposed into attributes that are rated on a [[semantic differential]] scale. The other is the “Preference data approach” in which respondents are asked their preference rather than similarity.
# '''Running the MDS statistical program''' – Software for running the procedure is available in many statistical software packages.  Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS<ref>{{cite journal|first1=J. B.|last1=Kruskal| author-link=Joseph Kruskal| title=Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis|journal=Psychometrika|pages=1–27| volume=29| issue=1| year=1964| doi=10.1007/BF02289565|s2cid=48165675}}</ref> (which deals with ordinal data).
# '''Decide number of dimensions''' – The researcher must decide on the number of dimensions they want the computer to create. Interpretability of the MDS solution is often important, and lower dimensional solutions will typically be easier to interpret and visualize. However, dimension selection is also an issue of balancing underfitting and overfitting. Lower dimensional solutions may underfit by leaving out important dimensions of the dissimilarity data. Higher dimensional solutions may overfit to noise in the dissimilarity measurements. Model selection tools like [[Akaike information criterion|AIC]], [[Bayesian information criterion|BIC]], [[Bayes factors]], or [[Cross-validation (statistics)|cross-validation]] can thus be useful to select the dimensionality that balances underfitting and overfitting.
# '''Mapping the results and defining the dimensions''' – The statistical program (or a related module) will map the results. The map will plot each product (usually in two-dimensional space). The proximity of products to each other indicate either how similar they are or how preferred they are, depending on which approach was used. How the dimensions of the embedding actually correspond to dimensions of system behavior, however, are not necessarily obvious. Here, a subjective judgment about the correspondence can be made (see [[perceptual mapping]]).
# '''Test the results for reliability and validity''' – Compute [[R-squared]] to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An R-square of 0.6 is considered the minimum acceptable level. {{Citation needed|date=February 2011}} An R-square of 0.8 is considered good for metric scaling and .9 is considered good for non-metric scaling. Other possible tests are Kruskal’s Stress, split data tests, data stability tests (i.e., eliminating one brand), and test-retest reliability.
# '''Report the results comprehensively''' – Along with the mapping, at least distance measure (e.g., [[Sorenson index]], [[Jaccard index]]) and reliability (e.g., stress value) should be given. It is also very advisable to give the algorithm (e.g., Kruskal, Mather), which is often defined by the program used (sometimes replacing the algorithm report), if you have given a start configuration or had a random choice, the number of runs, the assessment of dimensionality, the [[Monte Carlo method]] results, the number of iterations, the assessment of stability, and the proportional variance of each axis (r-square).