PICTURE-BASED RECOGNITION OF SMOKERS: A NOVEL VISUAL METHOD

It is important to obtain a deeper understanding of the social context of smoking, which may support finding new ways to hinder the development of a smoker’s identity. The authors developed the Picture Based Recognition of Smokers (PBRS) method in order to understand the identity markers of the social and visual contexts related to adolescent smoking. The differences in identifying non-smokers and smokers between traditional text-based questionnaires and PBRS were compared in a discriminatory analysis conducted by comparison clouds and correlograms. The ability of these methods to predict adolescent smokers was tested with a regression model combined with permutation analysis. The result of word clouds confirmed that interpretations of the visual identity markers of Picture-Based Recognition of Smokers: a novel visual method 35 pictures differ between non-smokers and smokers. PBRS had a better success rate of predictions than the text-based questionnaires. This approach develops preventive interventions which do not stigmatize the intervention group.


Introduction
Visual signs, forms and communication have been found to be effective ways of limiting positive perception of tobacco (Gallopel-Morvan, Patrick, Le Gall-Ely, and Rieunier, 2011;Lee, Cappella, Lerman, and Strasser, 2017;Drovandi, Teague, Glass, and Malau-Aduli, 2018) and intervening in smoking behaviour (Haines-Saah, Kelly, Oliffe, and Bottorff, 2015). The need to develop further the methods for the recognition of visual identity markers of smoking is evident: the identity markers have so far been developed and utilized mostly for marketing tobacco products. A better understanding of the factors supporting smoking as an identity is critical if the aim is to develop smoke-free societies.
The development of a social identity is essential in smoking. The more the adolescent determines his or her identity through smoking, the more likely it is that the adolescent will start smoking (Hertel and Mermelstein, 2012). The development of a smoker's identity takes time, and for young people smoking can remain for a long time a behaviour shaped by a social context (Tombor, Shahab, Herbec, et al., 2015). Furthermore, adolescents may not see themselves as smokers, but on the contrary, consider themselves as belonging to the non-smoking group (Berg, Lust, Sanem, et al., 2009). In the literature, watching music videos on television, socializing with peers who smoke and the development of an identity that includes smoking, have been linked with smoking behaviour (Slater and Hayes, 2010).
Recognizing visual representations connected to smoking is significant in developing methods of hindering the development of a smoker's identity. Individuals actively regulate their behaviour and may transform their identities over time. Their peers are an important source of interpersonal influence among young people. The situational influence also plays an essential role in how adolescents perceive their peers, and they may continually change their behaviour and identity according to the context. Therefore the context and situation can further strengthen or weaken the personal commitment of the adolescent to smoking (Pender, Murdaugh, and Parsons, 2011). The important role of their peers as a support in stopping smoking is inseparable; increasing one's capability to stop, expressing the desire to stop smoking to a friend and avoiding cigarettes being available when one stops smoking (Pingree, Boberg, Patten, et al., 2004).
The research aimed to develop a picture-based recognition of smokers (PBRS), which can lead to new contextual intervention methods based on Visual Research Methods, VRM. These are a collection of methods used to understand and interpret images (Balbale et al., 2014;Glaw, Inder, Kable, and Hazelton, 2017;Riviera, 2010;Rose, 2014). The visual materials utilised in VRM such as photos, films, videos, etc., demonstrate the diversity of visual culture. The utilisation of VRM can make visible the emotional, every-day and reflexive social contexts (Ranjit et al. 2019), which can be otherwise difficult to operationalise, yet contain a lot of information about individual behaviour.
In this article, the authors outline PBRS using a picture-based questionnaire to detect adolescent visual identity markers of smoking and analyse PBRS's strengths and weaknesses compared to traditional text-based questionnaires in detecting smoking among adolescents. The aim of PBRS is to find new ways to hinder the development of a smoker's identity for adolescents, and in this way to promote and support non--smoking. This approach emphasizes smoking as an identity which differentiates the life contexts and lifestyles of smokers and non-smokers from one another, and presumably the visual identity markers of smoking are interpreted differently by smoking and non-smoking adolescents. In order to test the assumption, the authors asked how the visual identity markers of smoking operationalized by pictures can help recognize adolescent smoking? The study deepens the understanding of the ability of PBRS to detect adolescent smoking with visual identity markers by asking how the text-based questionnaire and PBRS differ from one another in the discriminatory analysis in detecting smokers and non-smokers.

PBRS. Data and methods
In this study, the authors develop PBRS in order to understand the visual identity markers of smoking among adolescents. The aim of PBRS is to take the first steps towards developing contextual intervention methods for smoking by using visual representations associated with smoking.
Significant research evidence in the literature suggests that pictures are more effective than text in warning users and potential users of the health effects of tobacco (BinDhim, McGeechan, Alanazi, et al., 2017;Fong, Hammond, and Hitchman, 2009;Fong, Hammond, and Jiang, 2010;Hammond, 2011). Tobacco product regulations are increasingly aimed at preventing the use of visual attractive design features of tobacco products and their marketing that may have an effect on users (World Health Organization, 2007).The use of photographs makes it possible to understand social phenomena and identities more widely and accurately (Brazg, Bekemeier, Spigner, and Huebner, 2011), to extract meaningful data from the target audience and photo elicitation can help to develop unique understanding of the factors influencing healthy behavior (Riviera, 2010). Therefore the benefit of PBRS is that it can help to identify the everyday life habits of adolescents, which can be difficult to put into words, as visual contexts (Haines-Saah, Oliffe, White, and Bottorff, 2013) and thus PBRS can provide added value compared to traditional text-based questionnaires. PBRS refines the word-based questionnaire by using a picture-based approach, which is more directly connected with the perception of the identity markers of a smoker or non-smoker related to the social and visual context of smoking.

Data
The data for the comparison of the traditional text-based questionnaire and PBRS was compiled from two questionnaires. In the first questionnaire, 120 students from a vocational school were asked to write in brief about 40 different pictures dealing with the social and the physical environment, their moods and feelings.The themes for the pictures were based on research data related to the smoking escalation (Berg, Lust, Sanem, et al., 2009;Dunne, Bishop, Avery, and Darcy, 2017;Fellows, 2015;Hertel and Mermelstein, 2012;Slater and Hayes, 2010;Tombor, Shahab, Herbec, et al., 2015) and the selection of the pictures was based on exploring the adolescent's own culture.
In the first questionnaire, the respondents were shown one picture at a time for four seconds each in a computer class. Every picture had the same form: look at the picture and think about what the picture mainly means for you and focus on the most important message in the picture.
After seeing the picture, the respondents wrote down their own thoughts about what the picture and its visual identity markers mean for them using Google Forms. The answer time for each picture was about 30 seconds. Out of the students of the vocational school, 71.7% (n = 86) did not use tobacco products, 17.5% (n = 21) used tobacco products, and 10.8% (n = 13) used tobacco products occasionally. The latter two groups were combined into the same group in the later analyses.
The second questionnaire had two parts. In the first part, 82 students of a vocational school were asked how much they felt the picture represented their own life, attitude, and events in their life . The second part used the students' own words and descriptions of the pictures gathered during the first questionnaire to create a text-based questionnaire about the visual identity markers of smoking that did not include pictures. The meaning of the questions was intended to be the same as in the PBRS with pictures, but rather than using pictures, single words and meanings were used. The second questionnaire was completed by the vocational school students either at school or in their free time. Both questionnaires applied a Likert-type scale from 1 to 6, and the responses were collected in Google Forms. In total, the same 82 students responded to the second questionnaire just as in the first part; 52.4% (n = 43) of the students did not use tobacco products, 39% (n = 34) used tobacco products, and 8.5% (n = 7) used tobacco products occasionally. The study again combined smokers and occasional smokers into the same group.

Methods comparing text-based questionnaire and PBRS
The first stage of the comparison of the questionnaires focused on the data from the first questionnaire, and text-mining was used to generate comparison clouds of the PBRS responses of smokers and non-smokers. The comparison clouds were used to demonstrate different interpretations of the visual contexts and identity markers of the pictures between smokers and non-smokers, and to find pictures for further analysis in the second questionnaire. A comparison cloud compares the relative frequency of the terms related to each picture, which does not simply merge separate word clouds but rather plots the differences in word usage between non-smokers and smokers. The comparison cloud function is found in R's wordcloud package (Wei, 2017). The selection of the pictures at the first stage was based on the interpretation of comparison clouds, and only the pictures with a clear difference between smokers and non-smokers were selected for the second questionnaire.
In the second stage of the comparison, the associations between the PBRS pictures and the text-based responses from the second questionnaire were analysed via correlograms. The method used visualizes the correlation matrix of the visual identity markers related to the pictures and words from the text-based questionnaire, and at the same time demonstrates the differences between smokers and non--smokers. The correlograms were created with R and its corrplot package (Kleiber and Zeileis, 2008).
Finally, in the third stage of the comparison the ability of PBRS and the textbased questionnaires to detect smoking and non-smoking adolescents was compared with two regression models. The text-based regression model utilizes a permutation function in the selection of the independent variables because with the permutation it was possible to form the regression models, so that there were as many independent text-based variables as there were pictures in the PBRS model. To simplify, the stages in the used permutation method can be described as follows: (1) take a sample (random or with probabilities) from the explanatory variables, (2) recalculate the regression model using the sampled variables, (3) save the outcome (Nagelkerke R 2 and success rate of predictions), (4) repeat steps 1-3 a total of 2000 times, (5) visualize and describe the results (Jones, Maillardet, and Robinson, 2009;Faraway, 2006).
In step two, the generalized least-square models with binomial distribution were used for regression modeling (Aitkin, Francis, Hinde, and Darnell, 2009;Lumley, 2017). The dependent variable in the regression model was a dummy variable indicating whether the adolescent was a smoker (coded as 1) or non-smoker (coded as 0). The independent variables in the model were answers from the picture or text-based questionnaire, depending on the regression model. The comparison of the regression model's ability to detect smokers and non-smokers is based on the Nagelkerke pseudo-R 2 and classification accuracy.

Identity markers of the visual contexts of smoking
In the first stage of the comparison of the differences in the identity markers of the visual contexts between smokers and non-smokers was completed with the comparison word clouds. In total, the comparison clouds were generated for 40 pictures, but only three were selected for further analysis, i.e. those representing beer, reading and smoking (Figure 1). These pictures were selected because the differences in the interpretations of the identity markers demonstrated by the comparison clouds between smokers and non-smokers were greatest in these visual contexts.
The comparison cloud visualizes the words that are characterized by smoking and non-smoking adolescents. The comparison cloud on the left side of Figure 1 shows that smokers identify the picture of beer more often with markers that are described with words like "Friday," "drunk," and "having beer" than non-smokers, who seem to associate the picture with more neutral words without any specific identity markers like "beer" or "alcohol". The centre panel in Figure 1 shows that smokers more often associate the picture of reading with the negative identity marker "nothing came to mind," whereas non-smokers associate the picture more frequently with positive activities and identities such as "thoughts," "thinking," and "reading". The right side of Figure 1 shows that non-smokers associate the picture of the smoking character more often with the negative identity of smoking such as "depressed," "corrupted," and "cancer" than smokers, who associate the picture with identities such as "tough," "gangsta," and "relaxation" which describe a certain kind of identity markers associated with smoking. The comparison clouds demonstrate that smokers and non-smokers interpret and identify different identity markers from the visual contexts of the pictures which are clearly distinguishing smoking habits of adolescents.

Associations between PBRS and text-based identity markers
At the second stage of the comparison analysis, the associations between PBRS and text-based identity markers were analysed with correlograms in order to evaluate the similarity of the identity markers between the two types of questionnaires, and between smoking and non-smoking adolescents (Figure 2). The correlogram enables to show the differences in interpreting and identifying the identity markers of smokers and non-smokers. In the correlograms, the colour of each balloon shows the strength of the correlation and the size indicates statistical significance. If there is no coloured balloon, the correlation is not statistically significant.
The correlogram for the picture of beer shows clear differences in identity markers between smoking and non-smoking adolescents. According to the correlation analysis, non-smokers do not associate the picture of beer with the words "sauna night" or "weekend", which indicates that non-smokers also use many other words that have no correlation with other words in the correlogram (Figure 2a). For instance, the word "drunk" is not correlated with "weekend" or "sauna night", and "weekend" is not correlated with the words "beer" or "frosty beer". The correlogram of the smokers includes only statistically significant correlations (Figure 2a). For instance, in their correlogram, the word "beer" is highly correlated with "drunk," "weekend," and "sauna night." These differences indicate that smokers identify the picture of beer more frequently with visual identity markers related to enjoyment than non-smokers. The correlogram of the picture of reading is presented in Figure 2b. Again, the difference between non-smokers and smokers is clear. For non-smokers, the visual identity markers of pictures and single words are not correlated at all, indicating that they interpret the picture and words heterogeneously (Figure 2b). Smokers have more homogenous interpretations of the picture, correlating with words such as "thoughts," "thinking," "study," "school," "aims," and "useless" (Figure 2b). The correlogram reflects the fact that smokers identify themselves with the somewhat more negative visual identity markers towards the picture of reading than nonsmokers.
In the correlogram for the picture showing someone smoking, smokers associate the picture only with the word "depressed," while for non-smokers there is a correlation with the words "school," "tough," and "cancer" (Figure 2c). Briefly, these differences demonstrate the different identity markers from the visual contexts of smokers and non-smokers. Non-smokers identity markers associate the visual context of the picture with health markers while smokers seem to detect the 'closedness' of the character in the picture.

Detecting smoking among adolescents with PBRS
The third stage of the analysis compares the abilities of the text-based questionnaire and PBRS to predict adolescent smokers through regression modeling. The results from the regression model demonstrate that the identity markers from the visual contexts of pictures of beer, reading and smoking can be used to detect whether the adolescent smokes or not (Table 1). If the visual context of the picture of beer, for instance, is recognized as an identity marker of free time, or if the visual context of the picture of smoking visibly reflects the identity of smoking, then there is an increased risk that the adolescent is a smoker or is at high risk of starting to smoke. There is also a slight sign that if the visual context of the picture of reading is identified as related by the adolescent's perceptions on reading, the risk of smoking decreases. In total, these three identity markers from picture-based questions explain almost 45% of the smoking behaviour (Nagelkerke pseudo-R 2 ; Table 1), which is a moderate R 2 considering the number of independent variables in the regression model. The success rate of the PBRS model to predict smokers and non-smokers with a probability threshold of 0.5 is almost 81%, and thus the model is much more accurate than random guessing. Table 1 also contains the regression results from the text-based questionnaires. The best model of the responses to the text-based questionnaire is close to the PBRS model, with an R 2 of only 2, and a prediction success rate only 1% smaller than the equivalent indicators of the picture-based model (Table 1). However, based on the permutations, the average text-based model has a much lower R 2 and success rate than the PBRS. In the average random model the R 2 is 8 and the success rate is 6% smaller than in the picture-based model (Table 1).   Figure 3 shows the permutation results from the text-based regression models. In model A the independent variables from the text-based questionnaire are randomly included in the regression model, whereas in model B the selection of independent variables is based on probability sampling, where correlation statistics with the picture-based questionnaire are used as the sampling probabilities. Thus, the permutation model B strives to be as similar as possible with the used picture-based model. The permutation results in Figure 3 demonstrate that the PBRS has a better fit and success rate of predictions than the permutated text-based models. For instance, with permutation model A the Nagelkerke R 2 is only three times higher than the observed R 2 in 2000 permutations (Figure 3). Hence it seems to be very unlikely to achieve a higher level of R 2 or success rate with a text-based questionnaire.

Discussion and conclusions
According to the results, it is possible to recognize the different visual identity markers of smoking and non-smoking adolescents with PBRS, and use this information to classify people as smokers or non-smokers for targeting with further health promotion interventions. The physical and visual contexts, together with social smoking behaviour, reinforce and maintain a smoker's and non-smoker's identities.
By using the visual markers it is possible to efficiently depict social contexts, environments and the 'factors of life' situations or moods which are connected to beginning to smoke or the occasional smoking by adolescents. This can be used in the prevention of the development of smokers' identity. In this study the visual contexts of the pictures were effective indicators used to recognize the smoking behaviour of adolescents. The visual survey reveals subconscious behaviour of adolescents, especially in social situations, while the correlogram demonstrates the stress factors connected to smoking as a negative visual identity marker. These results refer to the challenge to develop smoking interventions which take into consideration the adolescents' development phase and processing their life situations, and how these challenges are connected to their smoking. Visible, social or stress situations which trigger smoking can be included in preventive computer games when a player can reflect them into his/her own life.
In this study, in the correlogram for the picture showing someone smoking, smokers associate the picture only with the word "depressed," while for non-smokers there is a correlation with the words "school," "tough," and "cancer". According to the latest studies, during the developmental period from adolescence to adulthood, cigarette smoking and depressive symptoms are found to be reciprocally associated (Ranjit et al., 2019).
The results show that PBRS can effectively identify adolescent smokers, and it can be an even more effective research tool than the traditional text-based survey. The findings confirm the view that there is a lot of potential associated with visual research methods. They allow for a better understanding of the visual contexts of smoking in the future and have significant potential for use in developing better insights into smoking. One of the strengths of the visual research methods is that it can be used to develop interventions that do not stigmatize the intervention group in order to promote health. The advantage of PBRS, just like other visual methods, is that they seem to be able to highlight identity markers related to subconscious healthy behaviour better than verbal questionnaires. Another important aspect is that adolescents do not have the same prejudice toward PBRS compared with traditional health surveys.
The PBRS results demonstrated a clear difference in the identity markers of smokers and non-smokers. The smokers associated the picture of reading with the feeling that "nothing came to mind" whereas non-smokers associated it with activities such as "thinking" and "reading". In earlier studies, poor academic results increase smoking among adolescents (Albert-Lőrincz, Paulik, Szabo, Foley, and Gasparik, 2019). The smokers also associated the picture of beer more often with words like "Friday" and "drunk," than non-smokers and with the picture showing someone smoking, smokers associate the picture only with the word "depressed," while for non-smokers there is a correlation with the words "school," "tough," and "cancer". In earlier studies the type of images selected in the questionnaires, differences in the qualities and the characteristics of the images themselves, were a distinguishing factor (Reynolds-Keefer, and Johnson, 2011). In this study, students' single words, descriptions and meanings of the pictures gathered during the first questionnaire were used to create a text-based questionnaire about the visual identity markers of smoking. This increases the cultural validity of the study (Lynch, Lerner, and Leventhal, 2013) and highlighted what these findings mean for implementing preventive practices (Lund, 2014).
The visual inquiry revealed the differences of the concepts connected to several visual contexts between the non-smoking and smoking adolescents. By being derived from the data-mining, the comparison clouds consisted in words which revealed characteristic visual contexts for non-smoking and smoking adolescents. The differences in the use of words provided information about both the non--smoking and smoking adolescents. The content and the meaning of the word clouds can be utilized in the planning of smoking interventions, for instance at small group level in upper secondary or vocational school when discussion and peer support can help to recognize individual characteristic visual contexts and smoking behaviour, possibly in its early stage. The participant-focused starting point in the studies gives information for the planning of health promotion planning. Haines-Saah et al. (2015) studied participants-driven photo-narratives about smoking and the process of stopping smoking in the socio-spatial context.
Differently interpreted identity markers of smokers and non-smokers underlines the strength of VRM in detecting smoking among adolescents. The results create the basis for planning, targeting and group selection of smoking-prevention actions among adolescents. In general, the challenges of PBRS and VRM are related to the cultural connectivity of the pictures and the multiplicity of images. Despite these weaknesses, the results of this article support and encourage the development and use of games and other visual methods in the prevention of adolescent smoking.

Funding details
This work was supported by the Ministry of Social Affairs and Health, Finland, under a grant from the Health Promotion Allowance (49/9.02.02/2017). The sponsors do not influence research work in any matter.