Today anyone can purchase technology that can measure, quantify, and record human biometric data. Wearable or portable sensors like ‘Fibit’ detect and track heart rates, steps taken, glucose levels, water quality, genomes, microbiomes and turn them into electronic data [1]. In the main time, some biometric signals are applied in many area in life. Take a face for example, as one of the parts that has the most details in the human body, it has even been used as a lock on smart phones and laptops. Therefore, as a user of architecture, what if we use this biometric data as an input to affect the environment? What if we can make an interior surface that has the ability to recognize human faces and give relevant responds like humans in a playful way. So what the project aims at is to create a responsive surface, using a combination of deployable rigid structure and soft actuation, in order for them to have interaction with humans through their biometric data like facial expressions.
There is plenty of biometric data as mentioned, but in this report the author will focus on facial expressions. Humans interact with others mainly through speech, but also through body gestures, to emphasise a certain part of the speech and display of emotions. Emotions are displayed by visual, vocal, and other physiological means. According to Ross'(1938) scaling techniques for a circular ordering of eight categories, human emotion can fall in a two-dimensional space. The horizontal dimension in this spatial metaphor is the pleasure-displeasure dimension, and the vertical is arousal-sleep [2]. One of the important ways in which humans display if they are pleasured or not is through facial expressions [3]. It is easy for humans to read facial expressions but this is difficult for machines, as machines only understand in terms of numbers. Therefore, how can we interpret the facial expression into numbers? How can a kinetic, artificial surface recognize facial expressions comparable to the way humans do? In this article, the author will focus on making the surface detecting human facial expression like a living creature.
1.2 Research method
Here is a overview of the group task:
 The research of the thesis is based on three methods:
– Research on the universal facial recognition.
– Analyse and compare technologies that already exist to select suitable computer vision to detect the facial expression.
– Collecting more sample faces to train the computer for interpreting specific facial expression.
– Propose and evaluate the design of the output part for different facial expressions.
– Experiments based on prototypes from different stages of the research.
2.Facial Recognition
The reason to choose facial expression is that facial expression is one of the most natural, powerful and immediate way for human to communicate their intensions and emotions [4]. They are visually observable, conversational, and interaction signals that classify our current focus of attention and regulate our interactions with environment and other persons [5]. Automatic facial expression analysis is an interesting and challenging problem, impacting important applications in lots of areas such as human-computer interaction nowadays.
2.1 Read the basic facial expression
How exactly do we read the human emotion by their face? In most facial expression recognisers, facial feature extractions followed by classification into an expression class. Tomkins provided a theoretical rationale for studying the face as a method of learning about personality and emotion. He also showed that observers can obtain very high agreement in judging emotion if the facial expressions are carefully selected to show what he believes are the innate facial affects [6]. Six basic expression classes, defined by Ekman are often used: fear, disgust, sadness happiness, surprise, and anger as shown in Fig6. In his research, there are emotions that can be identified universally from facial expression, cross-cultural, cross-species, cross-age [7]. Because facial expressions presented are not genuine but posted by subjects instructed to show a particular emotion or to move particular facial muscles.
Also Russell’s research of a circumplex model of affect [2] mentioned the relationship between human emotion and facial expressions. As we can see in the Fig7., eight variables fit in a circle in a two dimensional space in a manner analogous to points on a compass. The horizontal (east-west) dimension is the pleasure-displeasure dimension, and the vertical (north-south) is arousal-sleep. The remaining four variables do not form independent dimensions, but help to define the quadrants of space. Biometric data like heart rate and skin humidity can be used to express arousal level which is the vertical axis in certain extend. As for the horizontal axis, which shows people are in a positive mood or negative mood, can be read from facial expression according to Ekman’s research[7].
In general there are two approaches to represent the face and consequently the facial features to perform facial expression analysis[8]: the geometric feature-based methods and appearance-based methods.
The geometric facial feature-based methods present the shape, texture and location information of prominent components such as the mouth, eyes, nose, eyebrow, and chin, which can cover the variation in the appearance of the facial expression. When people express their emotions by face, basically some of the distance and scale on the face change. The appearance-based methods, on the other hand, using image filters such as Gabor wavelets which generate the facial feature for either the whole-face or specific regions in a face image. Appearance-based methods is also the basic logic for the webcam to distinguish the key points and we will talk more about it in chapter 2.2. The computer can only know the number and it requires the position of these key points. We need the geometric feature-based methods to interpret the data into specific facial expression. How this process works is explored in chapter 3.
2.2 .Detect facial expression in a technology method
Many techniques can be used in computer vision to automatically track face feature. For example, Black and Yacoob [9] use a local parameterized model of image motion obtained from optical flow analysis(Fig10.). They utilize a planar model for rigid facial motion and an affineplus- curvature model for non rigid motion. Essa and Pentland [10] first locate the nose, eyes and mouth. Then, from two consecutive normalised frames, a 2D spatio-temporal motion energy representation of facial motion is used as a dynamic face model. Cohn et al. [11] use feature points that are automatically tracked using a hierarchical optical flow method. The feature vectors, used for the recognition are created by calculating the displacement of the facial points. The displacement of a point is obtained by subtracting its normalized position in the first frame from its current normalized position. Tian et al. [12] proposed a feature-based method as mentioned, which uses geometric and motion facial features and detects transient facial features. The extracted features (mouth, eyes, brows and cheeks as shown in Fig.9) are represented with geometric and motion parameters. The furrows are also detected using a Canny edge detector (Fig11.) to measure orientation and quantify their intensity. The parameters of the lower and upper face are then fed into separate neural networks trained to recognize AUs (action units of the facial action coding system). From the history of facial detection by computer vision, we could learn the research process in technology method of facial expression detection: firstly, generate a basic model. Secondly, to locate key points. Thirdly, tacking the movement of key points. These steps will be applied in the following experiments in chapter 3.
2.3 Application in art works
In this project we have to think about not only the technique to detect the facial expressions but also how to use the data as an input to control the responsive surface. Here are two projects which are related to how to translating facial expression into visiable effect. The first project, which is called Face Dance[13], designed by filmmakers Ariel Schulman and Henry Joost and creative coders Aaron Meyers, Lauren McCarthy, and James George. The group decided to use facial recognition software and motion capture of a Michael Jackson impersonator to create a bank of moves for a program that allows people to control a projected image of the King of Pop by moving the muscles in their face. This project amplifies the tiny change of facial expression. Being inspired by this, our project can also use the physical motion showing these tiny movement in a more obvious way that can make people feel how rich and powerful their face can be like this project did. The equipment that Face Dance project uses is the depth sensor(Fig12.) which is a kind of high-tech and expensive but a pretty precise method . They can get an accurate real-time digital model of face by detecting the depth of the the surface and generating a real-time three dimensional model according to the face that is being detected, which can be the input to control the digital dancer. This equipment is quite precise but considering financial limitations, it would be too expensive to use for us.
The other project named Facey Space[14] was created in 2013. It used face recognition software Face OSC and a webcam to externalise human emotions by projecting sound and colours that correspond to a particular emotion. The creators developed a program in Processing (a coding program) that uses facial movement as a live data input. This method uses a normal Webcam, which is not that accurate to some extend but much easier to set up. And these equipments demonstrate a reasonable price for us. Moreover, the project really interprets the facial expression to various kinds of emotion. As a result, the author worked on the direction of how to detect facial expression by webcam combined with Face OSC, to translate the data to specific emotions in the following research. More details will be discussed in the next chapter. It also use the colour as a response for different emotions. The author gets inspired by this and made some color relative experiment based on the project.
3. Facial expression detection technology
3.1 Face OSC and Webcam
Though much progress has been made, recognising facial expression with high accuracy remains difficult due to the subtlety, complexity and variability of facial expressions. After researching some techniques, such as the depth sensor which can generate a 3d model of a human face, and the webcam which recognize a 2d image of faces mentioned before, the author prefers the Webcam(Fig17.) detection, because this kind of equipment can be found nearly anywhere; easy to get and set up.
As mentioned in chapter 2.1, two methods, the geometric feature-based methods and appearance-based methods can be used to detect face in computer vision. Geometric featured-based methods is the key for computer programs to detect the key points on human face as shown in figure n. Inspired by the art project Facey Space, the research continued with Face OSC(Fig18.), which is a quite powerful real-time face tracker program made by Kyle McDonald [15]. It can be a tool for prototyping face-based interaction. Generally, it helps the computer to abstract the key points on the face and keeps tracking their movements. Then the program can output the data of x position movements and y position movements of key points that we can use to interpret into specific emotion. First, the webcam catches the feature location on the face, secondly, the computer program extracts the feature and keep tracking their movement, then feature parameters can be outputted according to the motion[16]. As can be seen in Fig19, each motion will be shown in AUs(action units of the facial action coding system) compared with neutral face.
However, how does webcam can catch the key points on face? One of the means is based on the appearance-based methods. When the webcam detects the real-time image, the computer can filter the image into color pixel points(Fig20,21). The key points such as the corner of eyes and mouth has the specific filter wave because of the surface of human face. Human-beings have similar skeleton and our organ on the face create especial shadow. That cause each key points on the face has its specific filter wave, that wave can inform the computer to recognize this point and then keep tracking its motion. Let us take the corner of the eye for an example, like you can see in Fig22, each point around the eye has special brightness and hue, the computer can analysis the level of brightness by every pixel around the point and get a specific filter wave. When the wave matches specific parameter(Fig23), the computer will be told that this point is the corner of eye[17].
The limitation is that the results can not be always precise. The reason cause can be 4. (1) due to Webcam quality which can be low with many blurry pixels that can not be used as an effective input. (2).The data translated is therefore not always accurate for the computer program. Some facial expressions may have similar output like slight sadness and disgust, due to these tiny motion on two dimension image of frontal face, we can not define some emotion just by numbers. (3). Facial expression can be easily faked by people, so that we can not always give a precise definition to a value to say this means that kind of emotion. (4). There are some unavoidable cultural background between people. Those from different parts of world do have special personalities like subtle or exaggerated. As a result, we can not always define one facial expression by specific data. Nonetheless, after a series of tests we can say that the Webcam has the ability to catch the motion we need in the project at a very low cost.
3.2 Experiment
3.2.1 Research 1. Catch the key points
In my first tests, Face OSC can input the position of your eyebrows, eyes, nose, month and jaw in Processing, which is a software platform for coding. So that we can get the victors which shows the moving distances and directions. The tests starts to also be based on neural networks[18]. Once the face is detected in the image, the corresponding region is extracted, and is usually normalised to have the same size. For example, the same distance between two eyes.
The database being used for this experiment is the author’s own face. As can be seen in the Fig24, the data can be obtained by changing ones own facial expressions like anger, sadness and happiness. This data can be interpreted into some emotions by calculating the key points motions like the scale of the height and width of the mouth or the distance between the two eyes.
Here the author is going to provide more details on how exactly to translate the data of key points motion. Take the eyes for an example, the gap between two eyes and the distance between the eyebrow and eye provides a lot of information. Processing show the change of these four points, when you are getting angry, most people will frown their eyebrow, thus the gap is getting closer. And people like to raise their eyebrow when they are surprised. For my face, as shown in Fig26. this distance between my eyebrow and eyes is around 5.2 pixel. It will change to 4.7 pixel when I show an angry face, the value of more than 6 pixel when I try to be surprised. Another example is the shape of your mouth, we can also calculate out the scale of height and width of mouth when changing the facial expressions. In a normal mood, my personal scale value is 10.14 pixel. It will be 8.9 when I smile and the height will be more than 4 pixels when I am surprised.
3.2.2 Research 2. Collect and generate data
The database being used for the second experiment is 60 images in 6 facial expressions(Fig27), which is a normal face, smile, laugh, anger, sadness and surprise, from mixed gender students in the Bartlett, UCL from different cultural backgrounds. These volunteers were asked to sit in front of the webcam and were told to change their facial expressions. The author pressed the special key for each facial expression to get the relative data. The table below show some of the data and relevant analysis. It is clear to see how the key point changes in different facial expressions in these table and we can also get the scale and proportion from the data through calculation. After comparison and analysis of different people using the same facial expression, here is the result of how to define the value into specific emotions. It is said in Ekman’s research that there are emotions that can be identified universally from facial expression (cross-cultural, cross-species, cross-age)[7] supported by this theory and also from the author’s anlysis, we can identify a main trend in the same facial expression even from people with various cultural backgrounds although they are not exactly the same. As a result, it is reasonable for us to distinguish some specific emotions from a special range of values. Facial expression can be recognized by this means.
4.Experiments Design
Then the author begins to think about what kind of output can be generated to respond to the specific emotions. Inspired by the ‘Facey Space’, the art project being mentioned in chapter 2.2, the experiment starts with using colors as a feedback. Next step thinking about the density of geometry patterns such as dots or lines, it can also be relevant to arouse levels to show whether we are in a nervous mood or in a relax mood. Then the research is inspired by the reconfigurable structure from Harvard, where the group start thinking of ways to use the deployable objects as an output to amplify facial expressions.
4.1 Color as a feedback
For the data already generated, we know how to translate them. The if function in Processing, which is a function for qualify in coding language, can help us give a definition to these values. To use colour as feedback for emotions, first the author did research to see what kind of colour the film uses to express specific emotions. Sixty film (30 from eastern, 30 from western) scene with 3 specific emotions (20 for delight, 20 for anger and 10 for sadness) were inputted into the computer. Processing program abstract the images into 5*8 color pixel table and extracted the color as shown in the figure 29. Applying these colours to respond to different emotions by creating two digital patterns, one is unstable neural triangle(Fig30), in order to simulate the neural networks, the other is the wave(Fig31).
4.2 Geometry pattern change
After the first test, the group think it is better to focus on one specific emotion and divide it into several levels rather than define various kinds of emotions. The intention is to have people keep a positive attitude towards their lives , so that hopefully our responsive surface can be an engaging and playful way to help people be joyful. The author made the line game(Fig32) to test on the arousal levels, which reflects the vertical axis in the Russell’s circumplex model of affect. Using the gap between lines to show whether you are in a relaxing mood or a nervous mood.
4.3 Simulate on deployable structure
Getting inspired by the reconfigurable structure research from Harvard[19], and also a research made by MIT’s Media Lab, a project named QUARTZ [20] which is about the self-folding origami technology. The group start thinking whether we can drive the deployable rigid structure by soft actuation. Combining the research from other groupmates, the author’s group finally decided to use the deployable structure as a responsive output, interacting with human’s facial expression. Inspired by Hardvard’s design, the author made three interesting deployable geometry in Rhino Grasshopper, which is a 3d model making software, and use parameter sliders to simulate its changing process. First step, to try to use smiling to triger the angle of rotation. Each smile that is being detected can make the angle change by 15 degree. It would be a bit confusing for the player to realise the changing logic and hard to use more parameters like heart rate to trigger it in the mean time. As a result, in the next step, the author tried to control the angle changing into specific degrees according to level of smile. For example, when the player is showing the neutral face, nothing happens. But when the player slightly smiling at the webcam, the deployable geometry changes from 0 degree to 145 degree, and then back to 0 degree. When the player laughs, it can completely change to the opposite side, which is 270 degree, then back to the initial position. Compared with the last test, each change of this test starts from the common initial form and it has the same change every time the player showing the same facial expression. It becomes clearer for players to recognise the changing logic.
5.Application in latest prototype
The latest prototype being made of plywood flat and rubber hinge, is a large hexagon geometry unit driven by soft actuator, the air pump. Which means the degree of smile needs to be interpreted via the amount of air source. In order to do that, an air pump with model number CHC50/200 is required. This kind of air pump supplied by 230 volt electric, 50Hz, with its max air pressure is 8 bar.
How much air is need for specific angles? The normal flow rate of the air pump is 200 liter per minute, we use the pressure of 1 bar for the test. The experiment takes the air pocket of 24 cm height and 50 cm width for the sample with 2 kg load on top of it. As can be seen in the table, it takes 11.62 seconds from 0 degree to 45 degree. Then 2.44 seconds later, the sample change to 90 degree. 15.90 seconds would be enough for the structure change to be 150 degree from the initial form, and it takes 17.54 seconds in total for the two pieces to become completely flat.
Following this, we match the smile level(Fig37) combine with arousal level(Fig38) with the motion of transformable structure. Here are the results.
6.Conclusion
Facial expression recognition by computer vision is an interesting and challenging problem and impacts important applications in many areas such as human—computer interaction. In this report, the author starts with the question of how to make a deployable surface that has the ability to recognise human facial expressions and give relative responses like a living creature. Based on this main question, the research continued with Ekman’s universal states of faces and also the research lead by Russell, which is about a circumplex model of affect. These two main theories explained the relationship between facial expression and emotion. It answers the question why people can read relative emotion from facial expressions. It also proves that emotions can be identified universally from facial expressions; cross-cultural, cross-species, cross-age, which is an important theoretical support for the possibility to detect facial expression by computer vision. From this, the author focused on the technological part of facial recognition. Two main technologies are being used to detect facial expression. The depth sensor, which generates the accurate three dimensions of a face, and the Webcam and Face OSC, which can give a real-time two dimensional image of faces. Taking into consideration the implementation and financial capabilities, for this study, the latter technology was selected for use. Following the report, there were two main methods to recognize facial expression by computer vision. One is geometric feature-based methods, which is the main logic behind reading facial expression through the motion of key points. The other one, appearance based methods, is the principle for the webcam to define the key points on face through a computer program. Afterwards, a series of experiments is shown in the report. The former two explains how the author detects and collects facial data by Face OSC. Starting with her own face, relative data such as the distance between eyes in different facial expressions can be gathered and analysed. Followed by this, more faces from different cultural backgrounds were being collect to make the data more reliable. After that, three more experiments explored the relative possibilities on the output part. Colour responds to different emotions, density of geometry patterns reflect arousal levels and finally smiling degrees control the motion of reconfigurable structure. The last experiment developed this all further. For the application on the latest prototype, the author works more on the physical deployable structure instead of just simulating this process in a computer program. The degree of a smile should be interpreted through the pressure of air in order for it to drive the physical model unit through air pockets.
For future development areas, the project would combine more deployable units together to make a responsive surface actuated by air source which can interact with a human like living creature, by detecting their facial expression and heart rate or density of skin. It is acknowledged that the accuracy of facial expression recognition by computer vision can still have restrictions due to financial limitations. Facial recognition technology today, focuses more on high-tech products. Our project can be an opportunity for exploring facial expression detection by common technology – webcam, which is a normal piece of equipment, easily found in daily life. It provides a way to overcome limitations in this area. However, webcam combined with Face OSC technology is still not perfect and has problems, e.g. data translation. Hence, more experiments and design prototypes are required for further improvements.
Bibliography
Literature sources [1]Nafus, D. (2016). Quantified. Cambridge, Massachusetts: The MIT Press.
[2]Russell, J., Lewicka, M. and Niit, T. (1989). A cross-cultural study of a circumplex model of affect. Journal of Personality and Social Psychology, 57(5), pp.848-856.
[3]Cohen, I., Sebe, N., Garg, A., Chen, L. and Huang, T. (2003). Facial expression recognition from video sequences: temporal and static modeling. Computer Vision and Image Understanding, 91(1-2), pp.160-187.
[4]Shan, C., Gong, S. and McOwan, P. (2009). Facial expression recognition based on Local Binary Patterns: A comprehensive study. Image and Vision Computing, 27(6), pp.803-816.
[5]Breazeal, C. (2009). Role of expressive behaviour for robots that learn from people. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1535), pp.3527-3538.
[6]Ekman, P. (1993). Facial expression and emotion. American Psychologist, 48(4), pp.384-392.
[7]Ekman, P., Sorenson, E. and Friesen, W. (1969). Pan-Cultural Elements in Facial Displays of Emotion. Science, 164(3875), pp.86-88.
[8]Neeta S. and Shalini B.Facial Expression Recognition. International Journal on Computer Science and Engineering, Vol.02,No 05,2010, pp.1552-1557
[9]M. J. Black and Y. Yacoob, “Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion”, International Journal on Computer Vision, 25(1), pp.23—48.
[10]I. A. Essa and A. P. Pentland, “Coding, Analysis, Interpretation and Recognition of Facial Expressions”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), pp.757—763
[11]J. Cohn, A. Zlochower, J. Lien, and T. Kanade, “Feature-Point Tracking by Optical Flow Discriminates Subtle Differences in Facial Expression”, Proc. 3rd IEEE International Conference on Automatic Face and Gesture Recognition, pp. 396—401
[12]Y. Tian, T. Kanade, and J. F. Cohn, Recognizing Action Units for Facial Expression Analysis , IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2),pp.97—115
[16]M. Lyon, S. Akamatsu,M. Kamachi, and J. Gyobe, “Coding facial expressions with Gabor Wavelets”.In Proc. of 3rd IEEE Int.Conf.on Automatic Face and Gesture Recognition, pp.200-205
[17]Pantic, M. and Rothkrantz, L. (2004). Facial Action Recognition for Facial Expression Analysis From Static Face Images. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 34(3), pp.1449-1461.
[18]Babajide, A., Hofacker, I., Sippl, M. and Stadler, P. (1997). Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. Folding and Design, 2(5), pp.261-269.
Online sources [13]Creators. (2017). New Cinema: Dance Like Michael Jackson Using Your Face. [online] Available at: https://creators.vice.com/en_us/article/dance-like-michael-jackson [Accessed 5 May 2017].
[14]Behance.net. (2017). Behance. [online] Available at: https://www.behance.net/gallery/18707745/Emoting-Interactive-Installation-(Spring-2013) [Accessed 5 May 2017].
Code [15]Kyle McDonald’s FaceOSC [online] Available at: https://github.com/kylemcdonald/ofxFaceTracke
List of figures Fig1,2.Fitbit.com. (2017). Fitbit Flex Wireless Activity & Sleep Wristband. [online] Available at: https://www.fitbit.com/uk/flex [Accessed 14 Jul. 2017]. Fig3.VagueWare.com. (2017). Top 10 Awesome Face Recognition Software For iPhone. [online] Available at: http://www.vagueware.com/face-recognition-software-for-iphone/ [Accessed 14 Jul. 2017].
Fig4.Yixiao L. (2017). Group task Fig5.Yixiao L. (2017). Research method Fig6.Der Greif. (2017). Duchenne de Boulogne and Paul Ekman – Visual Alphabets of Emotions – Der Greif. [online] Available at: https://dergreif-online.de/artist-blog/duchenne-de-boulogne-and-paul-ekman-visual-alphabets-of-emotions/ [Accessed 14 Jul. 2017].
Fig22,23,24. Yixiao L. (2017). Detect facial expression in Face OSC Fig25,26.Yixiao L. (2017). Face data Fig27.Yixiao L. (2017). Face collection and relative data Fig28.Yixiao L. (2017). Data analysis Fig29.Yixiao L. (2017). Colour extraction from film Fig30,31,32.Yixiao L. (2017). Experiment on Face OSC- Processing Fig33.Seas.harvard.edu. (2017). A toolkit for transformable materials | Harvard John A. Paulson School of Engineering and Applied Sciences. [online] Available at: https://www.seas.harvard.edu/news/2017/01/toolkit-for-transformable-materials [Accessed 14 Jul. 2017]. Fig34.Notey. (2017). The Best Self-Folding Blogs – Notey. [online] Available at: http://www.notey.com/blogs/self_folding [Accessed 14 Jul. 2017]. Fig35.Yixiao L. (2017).Experiment on Face OSC- Rhino Grasshopper Fig36.Yingchao L. (2017). Emotion analysis Fig37.Yingchao L. (2017). Arousal level analysis Fig38.Yixiao L. (2017). Pleasure level analysis Fig39.Yixiao L. (2017). Time test on air pocket Fig40.Yixiao L. (2017). Photos of latest prototype Fig41.Yixiao L. (2017). Experiment on latest prototype
Counterproductive is a robotic installation about the consequences of algorithmic bias in artificial intelligence. The work is presented through the format of short film. There is an increasing depend...
Submit a Comment