Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Affective computing
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Technologies == In psychology, cognitive science, and in neuroscience, there have been two main approaches for describing how humans perceive and classify emotion: continuous or categorical. The continuous approach tends to use dimensions such as negative vs. positive, calm vs. aroused. The categorical approach tends to use discrete classes such as happy, sad, angry, fearful, surprise, disgust. Different kinds of machine learning regression and classification models can be used for having machines produce continuous or discrete labels. Sometimes models are also built that allow combinations across the categories, e.g. a happy-surprised face or a fearful-surprised face.<ref>{{Cite journal|title = A model of the perception of facial expressions of emotion by humans: Research overview and perspectives.|last = Aleix, and Shichuan Du|first = Martinez|date = 2012|journal = The Journal of Machine Learning Research |volume=13 |issue=1 |pages=1589–1608|url=https://www.jmlr.org/papers/volume13/martinez12a/martinez12a.pdf}}</ref> The following sections consider many of the kinds of input data used for the task of [[emotion recognition]]. ===Emotional speech=== Various changes in the autonomic nervous system can indirectly alter a person's speech, and affective technologies can leverage this information to recognize emotion. For example, speech produced in a state of fear, anger, or joy becomes fast, loud, and precisely enunciated, with a higher and wider range in pitch, whereas emotions such as tiredness, boredom, or sadness tend to generate slow, low-pitched, and slurred speech.<ref>{{cite journal | last1=Breazeal | first1=Cynthia | last2=Aryananda | first2=Lijin |title=Recognition of Affective Communicative Intent in Robot-Directed Speech | journal=Autonomous Robots | publisher=Springer | volume=12 | issue=1 | year=2002 | issn=0929-5593 | doi=10.1023/a:1013215010749 | pages=83–104 | s2cid=459892 |url=http://web.media.mit.edu/~cynthiab/Papers/breazeal-aryananda-AutoRo02.pdf}}</ref> Some emotions have been found to be more easily computationally identified, such as anger<ref name="Dellaert" /> or approval.<ref>{{Cite book|last1=Roy|first1=D.|last2=Pentland|first2=A.|title=Proceedings of the Second International Conference on Automatic Face and Gesture Recognition |chapter=Automatic spoken affect classification and analysis |date=1996-10-01|pages=363–367|doi=10.1109/AFGR.1996.557292|isbn=978-0-8186-7713-7|s2cid=23157273}}</ref> Emotional speech processing technologies recognize the user's emotional state using computational analysis of speech features. Vocal parameters and [[prosody (linguistics)|prosodic]] features such as pitch variables and speech rate can be analyzed through pattern recognition techniques.<ref name="Dellaert">Dellaert, F., Polizin, t., and Waibel, A., Recognizing Emotion in Speech", In Proc. Of ICSLP 1996, Philadelphia, PA, pp.1970–1973, 1996</ref><ref name="Lee">Lee, C.M.; Narayanan, S.; Pieraccini, R., Recognition of Negative Emotion in the Human Speech Signals, Workshop on Auto. Speech Recognition and Understanding, Dec 2001</ref> Speech analysis is an effective method of identifying affective state, having an average reported accuracy of 70 to 80% in research from 2003 and 2006.<ref>{{Cite journal|last1=Neiberg|first1=D|last2=Elenius|first2=K|last3=Laskowski|first3=K|date=2006|title=Emotion recognition in spontaneous speech using GMMs|url=http://www.speech.kth.se/prod/publications/files/1192.pdf|journal=Proceedings of Interspeech|doi=10.21437/Interspeech.2006-277|s2cid=5790745}}</ref><ref>{{Cite journal|last1=Yacoub|first1=Sherif|last2=Simske|first2=Steve|last3=Lin|first3=Xiaofan|last4=Burns|first4=John|date=2003|title=Recognition of Emotions in Interactive Voice Response Systems|journal=Proceedings of Eurospeech|pages=729–732|doi=10.21437/Eurospeech.2003-307 |citeseerx=10.1.1.420.8158|s2cid=11671944 }}</ref> These systems tend to outperform average human accuracy (approximately 60%<ref name="Dellaert" />) but are less accurate than systems which employ other modalities for emotion detection, such as physiological states or facial expressions.<ref name="Hudlicka-2003-p24">{{harvnb|Hudlicka|2003|p=24}}</ref> However, since many speech characteristics are independent of semantics or culture, this technique is considered to be a promising route for further research.<ref name="Hudlicka-2003-p25">{{harvnb|Hudlicka|2003|p=25}}</ref> ====Algorithms==== The process of speech/text affect detection requires the creation of a reliable [[database]], [[knowledge base]], or [[vector space model]],<ref name = "Osgood75"> {{cite book | author = Charles Osgood |author2=William May|author3=Murray Miron | title = Cross-Cultural Universals of Affective Meaning | url = https://archive.org/details/crossculturaluni00osgo | url-access = registration | publisher = Univ. of Illinois Press | isbn = 978-94-007-5069-2 | year = 1975 }} </ref> broad enough to fit every need for its application, as well as the selection of a successful classifier which will allow for quick and accurate emotion identification. {{Asof|2010}}, the most frequently used classifiers were linear discriminant classifiers (LDC), k-nearest neighbor (k-NN), Gaussian mixture model (GMM), support vector machines (SVM), artificial neural networks (ANN), decision tree algorithms and hidden Markov models (HMMs).<ref name="Scherer-2010-p241">{{harvnb|Scherer|Bänziger|Roesch|2010|p=241}}</ref> Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system.<ref name="Hudlicka-2003-p24"/> The list below gives a brief description of each algorithm: * [[Linear classifier|LDC]] – Classification happens based on the value obtained from the linear combination of the feature values, which are usually provided in the form of vector features. * [[K-nearest neighbor algorithm|k-NN]] – Classification happens by locating the object in the feature space, and comparing it with the k nearest neighbors (training examples). The majority vote decides on the classification. * [[Gaussian mixture model|GMM]] – is a probabilistic model used for representing the existence of subpopulations within the overall population. Each sub-population is described using the mixture distribution, which allows for classification of observations into the sub-populations.<ref>[http://cnx.org/content/m13205/latest/ "Gaussian Mixture Model"]. Connexions – Sharing Knowledge and Building Communities. Retrieved 10 March 2011.</ref> * [[Support vector machine|SVM]] – is a type of (usually binary) linear classifier which decides in which of the two (or more) possible classes, each input may fall into. * [[Artificial neural network|ANN]] – is a mathematical model, inspired by biological neural networks, that can better grasp possible non-linearities of the feature space. * [[Decision tree learning|Decision tree algorithms]] – work based on following a decision tree in which leaves represent the classification outcome, and branches represent the conjunction of subsequent features that lead to the classification. * [[Hidden Markov model|HMMs]] – a statistical Markov model in which the states and state transitions are not directly available to observation. Instead, the series of outputs dependent on the states are visible. In the case of affect recognition, the outputs represent the sequence of speech feature vectors, which allow the deduction of states' sequences through which the model progressed. The states can consist of various intermediate steps in the expression of an emotion, and each of them has a probability distribution over the possible output vectors. The states' sequences allow us to predict the affective state which we are trying to classify, and this is one of the most commonly used techniques within the area of speech affect detection. It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM-RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA) multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers.<ref>{{cite journal|url=http://ntv.ifmo.ru/en/article/11200/raspoznavanie_i_prognozirovanie_dlitelnyh__emociy_v_rechi_(na_angl._yazyke).htm|title=Extended speech emotion recognition and prediction|author=S.E. Khoruzhnikov|journal=Scientific and Technical Journal of Information Technologies, Mechanics and Optics|volume=14|issue=6|page=137|year=2014|display-authors=etal}}</ref> ====Databases==== The vast majority of present systems are data-dependent. This creates one of the biggest challenges in detecting emotions based on speech, as it implicates choosing an appropriate database used to train the classifier. Most of the currently possessed data was obtained from actors and is thus a representation of archetypal emotions. Those so-called acted databases are usually based on the Basic Emotions theory (by [[Paul Ekman]]), which assumes the existence of six basic emotions (anger, fear, disgust, surprise, joy, sadness), the others simply being a mix of the former ones.<ref name="Ekman, P. 1969">Ekman, P. & Friesen, W. V (1969). [http://www.communicationcache.com/uploads/1/0/8/8/10887248/the-repertoire-of-nonverbal-behavior-categories-origins-usage-and-coding.pdf The repertoire of nonverbal behavior: Categories, origins, usage, and coding]. Semiotica, 1, 49–98.</ref> Nevertheless, these still offer high audio quality and balanced classes (although often too few), which contribute to high success rates in recognizing emotions. However, for real life application, naturalistic data is preferred. A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the [[human–computer interaction]] (HCI). Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone. The first attempt to produce such database was the FAU Aibo Emotion Corpus for CEICES (Combining Efforts for Improving Automatic Classification of Emotional User States), which was developed based on a realistic context of children (age 10–13) playing with Sony's Aibo robot pet.<ref name="Steidl-2011">{{cite web | last = Steidl | first = Stefan | title = FAU Aibo Emotion Corpus | publisher = Pattern Recognition Lab | date = 5 March 2011 | url = http://www5.cs.fau.de/de/mitarbeiter/steidl-stefan/fau-aibo-emotion-corpus/ }}</ref><ref name="Scherer-2010-p243">{{harvnb|Scherer|Bänziger|Roesch|2010|p=243}}</ref> Likewise, producing one standard database for all emotional research would provide a method of evaluating and comparing different affect recognition systems. ====Speech descriptors==== The complexity of the affect recognition process increases with the number of classes (affects) and speech descriptors used within the classifier. It is, therefore, crucial to select only the most relevant features in order to assure the ability of the model to successfully identify emotions, as well as increasing the performance, which is particularly significant to real-time detection. The range of possible choices is vast, with some studies mentioning the use of over 200 distinct features.<ref name="Scherer-2010-p241"/> It is crucial to identify those that are redundant and undesirable in order to optimize the system and increase the success rate of correct emotion detection. The most common speech characteristics are categorized into the following groups.<ref name="Steidl-2011"/><ref name="Scherer-2010-p243"/> # Frequency characteristics<ref>{{Cite book |doi=10.1109/ICCCI50826.2021.9402569|isbn=978-1-7281-5875-4|chapter=Non-linear frequency warping using constant-Q transformation for speech emotion recognition|title=2021 International Conference on Computer Communication and Informatics (ICCCI)|pages=1–4|year=2021|last1=Singh|first1=Premjeet|last2=Saha|first2=Goutam|last3=Sahidullah|first3=Md|arxiv=2102.04029|s2cid=231846518}}</ref> #* Accent shape – affected by the rate of change of the fundamental frequency. #* Average pitch – description of how high/low the speaker speaks relative to the normal speech. #* Contour slope – describes the tendency of the frequency change over time, it can be rising, falling or level. #* Final lowering – the amount by which the frequency falls at the end of an utterance. #* Pitch range – measures the spread between the maximum and minimum frequency of an utterance. # Time-related features: #* Speech rate – describes the rate of words or syllables uttered over a unit of time #* Stress frequency – measures the rate of occurrences of pitch accented utterances # Voice quality parameters and energy descriptors: #* Breathiness – measures the aspiration noise in speech #* Brilliance – describes the dominance of high or low frequencies In the speech #* Loudness – measures the amplitude of the speech waveform, translates to the energy of an utterance #* Pause Discontinuity – describes the transitions between sound and silence #* Pitch Discontinuity – describes the transitions of the fundamental frequency. ===Facial affect detection=== The detection and processing of facial expression are achieved through various methods such as [[optical flow]], [[hidden Markov model]]s, [[Artificial neural network|neural network]] processing or active appearance models. More than one modality can be combined or fused (multimodal recognition, e.g. facial expressions and speech prosody,<ref name="face-prosody">{{cite conference | url = http://www.image.ece.ntua.gr/php/savepaper.php?id=447 | first1 = G. | last1 = Caridakis | first2 = L. | last2 = Malatesta | first3 = L. | last3 = Kessous | first4 = N. | last4 = Amir | first5 = A. | last5 = Raouzaiou | first6 = K. | last6 = Karpouzis | title = Modeling naturalistic affective states via facial and vocal expressions recognition | conference = International Conference on Multimodal Interfaces (ICMI'06) | location = Banff, Alberta, Canada | date = November 2–4, 2006 }}</ref> facial expressions and hand gestures,<ref name="face-gesture">{{cite book | chapter-url = http://www.image.ece.ntua.gr/php/savepaper.php?id=334 | first1 = T. | last1 = Balomenos | first2 = A. | last2 = Raouzaiou | first3 = S. | last3 = Ioannou | first4 = A. | last4 = Drosopoulos | first5 = K. | last5 = Karpouzis | first6 = S. | last6 = Kollias | chapter = Emotion Analysis in Man-Machine Interaction Systems | editor1-first = Samy | editor1-last = Bengio | editor2-first = Herve | editor2-last = Bourlard | title = Machine Learning for Multimodal Interaction | series = [[Lecture Notes in Computer Science]] | volume = 3361| year = 2004 | pages = 318–328 | publisher = [[Springer-Verlag]] }}</ref> or facial expressions with speech and text for multimodal data and metadata analysis) to provide a more robust estimation of the subject's emotional state. ==== Facial expression databases ==== {{Main|Facial expression databases}} Creation of an emotion database is a difficult and time-consuming task. However, database creation is an essential step in the creation of a system that will recognize human emotions. Most of the publicly available emotion databases include posed facial expressions only. In posed expression databases, the participants are asked to display different basic emotional expressions, while in spontaneous expression database, the expressions are natural. Spontaneous emotion elicitation requires significant effort in the selection of proper stimuli which can lead to a rich display of intended emotions. Secondly, the process involves tagging of emotions by trained individuals manually which makes the databases highly reliable. Since perception of expressions and their intensity is subjective in nature, the annotation by experts is essential for the purpose of validation. Researchers work with three types of databases, such as a database of peak expression images only, a database of image sequences portraying an emotion from neutral to its peak, and video clips with emotional annotations. Many facial expression databases have been created and made public for expression recognition purpose. Two of the widely used databases are CK+ and JAFFE. ====Emotion classification==== {{Main|Emotion classification}} By doing cross-cultural research in Papua, New Guinea, on the Fore Tribesmen, at the end of the 1960s, [[Paul Ekman]] proposed the idea that facial expressions of emotion are not culturally determined, but universal. Thus, he suggested that they are biological in origin and can, therefore, be safely and correctly categorized.<ref name="Ekman, P. 1969"/> He therefore officially put forth six basic emotions, in 1972:<ref>{{cite conference | last = Ekman | first = Paul | author-link = Paul Ekman | year = 1972 | title = Universals and Cultural Differences in Facial Expression of Emotion | editor-first = J. | editor-last = Cole | conference = Nebraska Symposium on Motivation | location = Lincoln, Nebraska | publisher = University of Nebraska Press | pages = 207–283 }}</ref> * [[Anger]] * [[Disgust]] * [[Fear]] * [[Happiness]] * [[Sadness]] * [[Surprise (emotion)|Surprise]] However, in the 1990s Ekman expanded his list of basic emotions, including a range of positive and [[negative emotion]]s not all of which are encoded in facial muscles.<ref>{{Cite book|last=Ekman |first=Paul |author-link=Paul Ekman |year=1999 |url=http://www.paulekman.com/wp-content/uploads/2009/02/Basic-Emotions.pdf |contribution=Basic Emotions |editor1-first=T |editor1-last=Dalgleish |editor2-first=M |editor2-last=Power |title=Handbook of Cognition and Emotion |place=Sussex, UK |publisher=John Wiley & Sons |url-status=dead |archive-url=https://web.archive.org/web/20101228085345/http://www.paulekman.com/wp-content/uploads/2009/02/Basic-Emotions.pdf |archive-date=2010-12-28 }}.</ref> The newly included emotions are: # [[Amusement]] # [[Contempt]] # [[Contentment]] # [[Embarrassment]] # [[Anticipation (emotion)|Excitement]] # [[Guilt (emotion)|Guilt]] # [[Pride| Pride in achievement]] # [[Relief (emotion)|Relief]] # [[Contentment|Satisfaction]] # [[Pleasure|Sensory pleasure]] # [[Shame]] ====Facial Action Coding System==== {{Main|Facial Action Coding System}} A system has been conceived by psychologists in order to formally categorize the physical expression of emotions on faces. The central concept of the [[Facial Action Coding System]], or FACS, as created by Paul Ekman and Wallace V. Friesen in 1978 based on earlier work by [[Carl-Herman Hjortsjö]]<ref>[http://face-and-emotion.com/dataface/facs/description.jsp "Facial Action Coding System (FACS) and the FACS Manual"] {{webarchive |url=https://web.archive.org/web/20131019130324/http://face-and-emotion.com/dataface/facs/description.jsp |date=October 19, 2013 }}. A Human Face. Retrieved 21 March 2011.</ref> are action units (AU). They are, basically, a contraction or a relaxation of one or more muscles. Psychologists have proposed the following classification of six basic emotions, according to their action units ("+" here mean "and"): {| class="wikitable sortable" |- ! Emotion !! Action units |- | Happiness ||6+12 |- | Sadness || 1+4+15 |- | Surprise || 1+2+5B+26 |- | Fear || 1+2+4+5+20+26 |- | Anger || 4+5+7+23 |- | Disgust || 9+15+16 |- | Contempt || R12A+R14A |} ====Challenges in facial detection==== As with every computational practice, in affect detection by facial processing, some obstacles need to be surpassed, in order to fully unlock the hidden potential of the overall algorithm or method employed. In the early days of almost every kind of AI-based detection (speech recognition, face recognition, affect recognition), the accuracy of modeling and tracking has been an issue. As hardware evolves, as more data are collected and as new discoveries are made and new practices introduced, this lack of accuracy fades, leaving behind noise issues. However, methods for noise removal exist including neighborhood averaging, [[Gaussian blur|linear Gaussian smoothing]], median filtering,<ref>{{cite web|url=http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT5/node3.html|title=Spatial domain methods}}</ref> or newer methods such as the Bacterial Foraging Optimization Algorithm.<ref>Clever Algorithms. [http://www.cleveralgorithms.com/nature-inspired/swarm/bfoa.html "Bacterial Foraging Optimization Algorithm – Swarm Algorithms – Clever Algorithms"] {{Webarchive|url=https://web.archive.org/web/20190612144816/http://www.cleveralgorithms.com/nature-inspired/swarm/bfoa.html |date=2019-06-12 }}. Clever Algorithms. Retrieved 21 March 2011.</ref><ref>[http://www.softcomputing.net/bfoa-chapter.pdf "Soft Computing"]. Soft Computing. Retrieved 18 March 2011.</ref> Other challenges include * The fact that posed expressions, as used by most subjects of the various studies, are not natural, and therefore algorithms trained on these may not apply to natural expressions. * The lack of rotational movement freedom. Affect detection works very well with frontal use, but upon rotating the head more than 20 degrees, "there've been problems".<ref>Williams, Mark. [http://www.technologyreview.com/Infotech/18796/?a=f "Better Face-Recognition Software – Technology Review"] {{Webarchive|url=https://web.archive.org/web/20110608023222/http://www.technologyreview.com/Infotech/18796/?a=f |date=2011-06-08 }}. Technology Review: The Authority on the Future of Technology. Retrieved 21 March 2011.</ref> * Facial expressions do not always correspond to an underlying emotion that matches them (e.g. they can be posed or faked, or a person can feel emotions but maintain a "poker face"). * FACS did not include dynamics, while dynamics can help disambiguate (e.g. smiles of genuine happiness tend to have different dynamics than "try to look happy" smiles.) * The FACS combinations do not correspond in a 1:1 way with the emotions that the psychologists originally proposed (note that this lack of a 1:1 mapping also occurs in speech recognition with homophones and homonyms and many other sources of ambiguity, and may be mitigated by bringing in other channels of information). * Accuracy of recognition is improved by adding context; however, adding context and other modalities increases computational cost and complexity ===Body gesture=== {{Main|Gesture recognition}} Gestures could be efficiently used as a means of detecting a particular emotional state of the user, especially when used in conjunction with speech and face recognition. Depending on the specific action, gestures could be simple reflexive responses, like lifting your shoulders when you don't know the answer to a question, or they could be complex and meaningful as when communicating with sign language. Without making use of any object or surrounding environment, we can wave our hands, clap or beckon. On the other hand, when using objects, we can point at them, move, touch or handle these. A computer should be able to recognize these, analyze the context and respond in a meaningful way, in order to be efficiently used for Human–Computer Interaction. There are many proposed methods<ref name="JK">J. K. Aggarwal, Q. Cai, Human Motion Analysis: A Review, Computer Vision and Image Understanding, Vol. 73, No. 3, 1999</ref> to detect the body gesture. Some literature differentiates 2 different approaches in gesture recognition: a 3D model based and an appearance-based.<ref name="Vladimir">{{cite journal | first1 = Vladimir I. | last1 = Pavlovic | first2 = Rajeev | last2 = Sharma | first3 = Thomas S. | last3 = Huang | url = http://www.cs.rutgers.edu/~vladimir/pub/pavlovic97pami.pdf | title = Visual Interpretation of Hand Gestures for Human–Computer Interaction: A Review | journal = [[IEEE Transactions on Pattern Analysis and Machine Intelligence]] | volume = 19 | issue = 7 | pages = 677–695 | year = 1997 | doi = 10.1109/34.598226 | s2cid = 7185733 }}</ref> The foremost method makes use of 3D information of key elements of the body parts in order to obtain several important parameters, like palm position or joint angles. On the other hand, appearance-based systems use images or videos to for direct interpretation. Hand gestures have been a common focus of body gesture detection methods.<ref name="Vladimir"/> ===Physiological monitoring=== This could be used to detect a user's affective state by monitoring and analyzing their physiological signs. These signs range from changes in heart rate and skin conductance to minute contractions of the facial muscles and changes in facial blood flow. This area is gaining momentum and we are now seeing real products that implement the techniques. The four main physiological signs that are usually analyzed are [[Pulse|blood volume pulse]], [[Skin conductance|galvanic skin response]], [[facial electromyography]], and facial color patterns. ====Blood volume pulse==== =====Overview===== A subject's blood volume pulse (BVP) can be measured by a process called [[photoplethysmography]], which produces a graph indicating blood flow through the extremities.<ref name="Picard, Rosalind 1998">Picard, Rosalind (1998). Affective Computing. MIT.</ref> The peaks of the waves indicate a cardiac cycle where the heart has pumped blood to the extremities. If the subject experiences fear or is startled, their heart usually 'jumps' and beats quickly for some time, causing the amplitude of the cardiac cycle to increase. This can clearly be seen on a photoplethysmograph when the distance between the trough and the peak of the wave has decreased. As the subject calms down, and as the body's inner core expands, allowing more blood to flow back to the extremities, the cycle will return to normal. =====Methodology===== Infra-red light is shone on the skin by special sensor hardware, and the amount of light reflected is measured. The amount of reflected and transmitted light correlates to the BVP as light is absorbed by hemoglobin which is found richly in the bloodstream. =====Disadvantages===== It can be cumbersome to ensure that the sensor shining an infra-red light and monitoring the reflected light is always pointing at the same extremity, especially seeing as subjects often stretch and readjust their position while using a computer. There are other factors that can affect one's blood volume pulse. As it is a measure of blood flow through the extremities, if the subject feels hot, or particularly cold, then their body may allow more, or less, blood to flow to the extremities, all of this regardless of the subject's emotional state. [[File:Em-face-2.png|thumb|left| The corrugator supercilii muscle and zygomaticus major muscle are the 2 main muscles used for measuring the electrical activity, in facial electromyography.]] ====Facial electromyography==== {{Main|Facial electromyography}} Facial electromyography is a technique used to measure the electrical activity of the facial muscles by amplifying the tiny electrical impulses that are generated by muscle fibers when they contract.<ref name="Larsen JT 2003">Larsen JT, Norris CJ, Cacioppo JT, "[https://web.archive.org/web/20181030170423/https://pdfs.semanticscholar.org/c3a5/4bfbaaade376aee951fe8578e6436be59861.pdf Effects of positive and negative affect on electromyographic activity over zygomaticus major and corrugator supercilii]", (September 2003)</ref> The face expresses a great deal of emotion, however, there are two main facial muscle groups that are usually studied to detect emotion: The corrugator supercilii muscle, also known as the 'frowning' muscle, draws the brow down into a frown, and therefore is the best test for negative, unpleasant emotional response.↵The zygomaticus major muscle is responsible for pulling the corners of the mouth back when you smile, and therefore is the muscle used to test for a positive emotional response. [[File:Gsrplot.svg|500px|thumb|Here we can see a plot of skin resistance measured using GSR and time whilst the subject played a video game. There are several peaks that are clear in the graph, which suggests that GSR is a good method of differentiating between an aroused and a non-aroused state. For example, at the start of the game where there is usually not much exciting game play, there is a high level of resistance recorded, which suggests a low level of conductivity and therefore less arousal. This is in clear contrast with the sudden trough where the player is killed as one is usually very stressed and tense as their character is killed in the game.]] ====Galvanic skin response==== {{Main|Galvanic skin response}} Galvanic skin response (GSR) is an outdated term for a more general phenomenon known as [[Electrodermal activity]] or EDA. EDA is a general phenomena whereby the skin's electrical properties change. The skin is innervated by the [sympathetic nervous system], so measuring its resistance or conductance provides a way to quantify small changes in the sympathetic branch of the autonomic nervous system. As the sweat glands are activated, even before the skin feels sweaty, the level of the EDA can be captured (usually using conductance) and used to discern small changes in autonomic arousal. The more aroused a subject is, the greater the skin conductance tends to be.<ref name="Picard, Rosalind 1998"/> Skin conductance is often measured using two small [[silver-silver chloride]] electrodes placed somewhere on the skin and applying a small voltage between them. To maximize comfort and reduce irritation the electrodes can be placed on the wrist, legs, or feet, which leaves the hands fully free for daily activity. ====Facial color==== =====Overview===== The surface of the human face is innervated with a large network of blood vessels. Blood flow variations in these vessels yield visible color changes on the face. Whether or not facial emotions activate facial muscles, variations in blood flow, blood pressure, glucose levels, and other changes occur. Also, the facial color signal is independent from that provided by facial muscle movements.<ref name="face">{{cite journal | last1=Benitez-Quiroz | first1=Carlos F. | last2=Srinivasan | first2=Ramprakash | last3=Martinez | first3=Aleix M. | title=Facial color is an efficient mechanism to visually transmit emotion | journal=Proceedings of the National Academy of Sciences | volume=115 | issue=14 | date=2018-03-19 | doi=10.1073/pnas.1716084115 | pages=3581–3586| pmid=29555780 | pmc=5889636 | bibcode=2018PNAS..115.3581B | doi-access=free }}</ref> =====Methodology===== Approaches are based on facial color changes. Delaunay triangulation is used to create the triangular local areas. Some of these triangles which define the interior of the mouth and eyes (sclera and iris) are removed. Use the left triangular areas’ pixels to create feature vectors.<ref name="face"/> It shows that converting the pixel color of the standard RGB color space to a color space such as oRGB color space<ref name="orgb">{{cite journal | last1=Bratkova | first1=Margarita | last2=Boulos | first2=Solomon | last3=Shirley | first3=Peter | title=oRGB: A Practical Opponent Color Space for Computer Graphics | journal=IEEE Computer Graphics and Applications | volume=29 | issue=1 | year=2009 | doi=10.1109/mcg.2009.13 | pages=42–55| pmid=19363957 | s2cid=16690341 }}</ref> or LMS channels perform better when dealing with faces.<ref name="mec">Hadas Shahar, [[Hagit Hel-Or]], [http://openaccess.thecvf.com/content_ICCVW_2019/papers/CVPM/Shahar_Micro_Expression_Classification_using_Facial_Color_and_Deep_Learning_Methods_ICCVW_2019_paper.pdf Micro Expression Classification using Facial Color and Deep Learning Methods], The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 0–0.</ref> So, map the above vector onto the better color space and decompose into red-green and yellow-blue channels. Then use deep learning methods to find equivalent emotions. ===Visual aesthetics=== Aesthetics, in the world of art and photography, refers to the principles of the nature and appreciation of beauty. Judging beauty and other aesthetic qualities is a highly subjective task. Computer scientists at Penn State treat the challenge of automatically inferring the aesthetic quality of pictures using their visual content as a machine learning problem, with a peer-rated on-line photo sharing website as a data source.<ref name="datta">Ritendra Datta, Dhiraj Joshi, [[Jia Li]] and James Z. Wang, [https://web.archive.org/web/20181030170421/https://pdfs.semanticscholar.org/8772/877ceb40d6d8685655145034740f3df7baad.pdf Studying Aesthetics in Photographic Images Using a Computational Approach], Lecture Notes in Computer Science, vol. 3953, Proceedings of the European Conference on Computer Vision, Part III, pp. 288–301, Graz, Austria, May 2006.</ref> They extract certain visual features based on the intuition that they can discriminate between aesthetically pleasing and displeasing images.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)