www.ssoar.info Relations between measures of executive functions and self-regulation in preschoolers

Performance on different measures of executive functions (EF) and self-regulation (SR) does not always correspond to the behaviour children show in real-life situations. The present study assesses the relationships between performances on different EF and SR measures and teacher ratings of children’s selfcontrol and thoughtfulness. In total, 217 children between 34 and 72 months (54% boys) were assessed. Four tests measuring cognitive EF (Digit Span backward, Block Recall, Day-Night Stroop, Hearts & Flowers) and two tests measuring behavioural EF (Tower) and SR (Head-Toes-Knees-Shoulders [HTKS]) were administered. Additionally, teachers rated the dimension ‘self-control and thoughtfulness’ of the German observation scale ‘Social‐emotional well‐being and resilience of children in early childhood settings’ (PERiK). It was found that all measures differentiated with regard to age in the range of three to six years. Correlations between cognitive EF measures with the HTKS were almost twice as high as correlations with the Tower. This indicates that the HTKS taps similar processes as the cognitive EF measures. Teacher ratings did not show higher correlations with behavioural EF and SR than with cognitive EF measures. Also, behavioural EF and SR measures did not predict scores obtained on the teacher rating better than cognitive EF measure. This article discusses to what extent distinctions among measures of EF and SR are possible and useful.

Schlagwörter: Exekutive Funktionen, Selbstregulation, Testinstrumente, kognitive Entwicklung, Kindergarten 1 Introduction and aim Self-regulation (SR) can be broadly defined as goal-directed behaviour, typically within at least a minimal temporal perspective (Hofmann/Schmeichel/Baddeley 2012). The term refers to the ability to manage emotions, directing thoughts and regulate and adapt behavior (Blair/Razza 2007;Smith-Donald u.a. 2007). In contrast, the term self-control (SC) is often used to describe mainly the ability of response inhibition. A common example is to override an automatic response to activate an alternative more promising response in order to achieve a certain goal (Diamond 2013;Hofmann/Schmeichel/Baddeley 2012). SR as well as SC are closely connected to executive functions (EF), an umbrella term that refers to cognitive processes of the prefrontal cortex (Duncan 1986;Luria 1976) that are necessary for goal-directed behaviour. The term EF encompasses a heterogeneous set of cognitive skills such as inhibitory control, working memory, cognitive flexibility, attention, planning, reflection and error detection (Anderson 2010;Welsh/Pennington/Groisser 1991;Zelazo u.a. 1997). They are essential for processes such as effortful control and selective attention as well as adaptively responding to novel or challenging situations when automatic, overlearned responses are inadequate or not existent (e.g. Miller/Cohen 2001;Zelazo u.a. 2003).
Three of the skills assigned to the term EF are argued to be central for most others (e.g. Hughes 1998;Welsh/Pennington/Groisser 1991). Following the framework of Miyake u.a. (2000) known as the unity and diversity construct of EF (Miyake/Friedman 2012;Miyake u.a. 2000), those skills are (updating of) working memory, inhibitory control and cognitive flexibility (shifting). Working memory describes the ability to hold, update and monitor information mentally. Therefore, it is crucial for planning and problem solving (Baddeley 1986). Inhibitory control is the ability to resist a first impulse and to reflect before acting in order to achieve a desired goal. Inhibitory control is also needed to block out interferences in order to stay focused (Rothbart/Posner 1985). Cognitive flexibility (also called shifting) is involved when going back and forth or changing between tasks or mental sets and when adjusting to novel situations. It is also important to take on a new perspective and discover different ways to approach a problem (Diamond 2007;Diamond u.a. 2007).
2 Assessing executive functions, self-regulation and self-control Choosing the right measures to assess EF, SR and SC is not an easy undertaking given the vast variety of the existing instruments (Carlson 2005). Any good measure has to fulfil certain criteria that can help identifying useful assessment methods. Firstly, the measure should be appropriate for the age group of interest. Some measures lack relevance to children, which makes it hard for them to stay on task, affects their performance and might keep them from reaching their full potential (Anderson u.a. 2002;Carlson 2005). Other measures are time consuming or complicated to administer (Ponitz u.a. 2008), making them unsuitable for a number of age groups or larger samples. Secondly, the measures should tap as exclusively as possible the construct of interest and keep the influence of other competencies such as language skills, attention, or memory capacity that could moderate performance to a minimum (Anderson u.a. 2002;Espy u.a. 2008). Thirdly, impairments in one domain of EF do not necessarily imply that other executive domains are also impaired. Hence, it is of interest to include multiple measures especially when measuring EF to tap the different domains across different modalities (Anderson/Reidy 2012). It is recommended to use several domain-general and domain-specific measures as well as different levels of analysis (e.g. neurological, physiological, rating measures and questionnaires) (McClelland u.a. 2015). And last but not least, the measure used should be reliable and valid, e.g. internally consistent and temporarily stable in their results. One aspect that is often neglected when it comes to choosing appropriate measure is its ecological validity. To ensure ecological validity, Gioia and Isquith (2004) recommend the use of both assessments of individual components with standardized tests as well as observation of behavioural application of EF in a real-world context.

Performance-based measures
Many studies assessing EF use performance-based tests administered by an examiner that measure EF in highly standardized conditions and usually assess accuracy and/or response time (Pennington/Ozonoff 1996;Toplak/West/Stanovich 2013). Tests that have been used widely are for example the Dots task (Diamond u.a. 2007;Shing u.a. 2010), the Wisconsin-Card Sorting Test (Heaton 1993) and the Stroop test (Jensen/Rohwer 1966;Stroop 1935). However, performance on such psychometric cognitive tests might not be a sufficient indicator of behaviour shown in everyday life (Anderson u.a. 2002). The administration takes place in a quiet, one-on-one, structured setting with minimal distractions with the test administrator providing support for the child to stay on task and finish each test (Stuss/Alexander 2000). This highly standardized laboratory-like testing situation is hardly comparable to situations at home or in classrooms (Anderson 1998;Falk/Heckman 2009;Rimm-Kaufmann u.a. 2009). This raises some doubts on whether the performance shown on tests administered in such controlled environments reflects behaviour shown in real-life situations. The lack of translation into every-day behaviour makes the usage of more applied behavioural measures, or teacher or parent ratings very important. There are some performance-based measures that assess children's performance in tests that resemble real-life situations. However, those measures rarely assess EF but rather children's application of EF or in other words SR (Toplak/West/Stanovich 2013). Whereas EF measures often aim at assessing the three components in isolation, SR measures often rely on the application of EF as a whole. This might heighten the ecological validity but gives less insight on the development of the individual components.
One example of a more realistic EF measure is the Tower (Kochanska u.a. 1996) assessing inhibitory control. The task is embedded into a standard play situation in which the child is asked to take turns with the test administrator. Hence, although all assessed in standardized, one-on-one situations, performance-based measures of EF can differ in their level of resemblance to real-life situations and putatively in their ecological validity.
Compared to EF measures, there are more performance-based SR measures that resemble realistic situations. One example of an assessment which aim is to tap the translation of cognitive functioning into behaviour is the Head-Toe-Shoulders-Knees (HTKS; Ponitz u.a. 2008;Ponitz u.a. 2009). The extended version of the HTKS-task, which is identified as a behavioural measure of SR by the authors, clearly specifies the three individual components of EF: inhibitory control, working memory and cognitive flexibility (Ponitz u.a. 2008). In the task, the child has to refrain from following commands given by the test administrator and do something else. After some time, the rules of the game are changed requiring the child to apply the new rules in its behaviour.
So far, there is little agreement on how to label measures of EF and SR that assess the two constructs in more realistic ways in order to distinguish them from the measures that assess EF and SR with tasks further away from real-life situations. The two examples given above, the Tower as a measure of inhibitory control and the HTKS as a measure of behavioural SR, make it clear that the distinction into measures of EF and SR is not always the distinction into more and less realistic measures.

Hot and cool measures
One way measures are distinguished in is into 'hot' and 'cool' measures. The distinction is based on the assumption that some measures are more motivationally and emotionally relevant to the test taker than others and therefore tap hot EF (Zelazo/Carlson 2012;Zelazo u.a. 2005). Hot EF measures also often involve a social component although a social aspect alone does not mean that the task is necessarily assessing hot EF (Zelazo u.a. 2005). Also, hot measures of EF can also be quite artificial in nature and do not necessarily resemble real-life situations. Measures that are personally not of great relevance, often presenting abstract and emotionally neutral tasks aim at assessing cool EF (Zelazo u.a. 2005). Therefore, measures such as the Tower might fall into the category of hot measures of EF as it contains quite a strong motivational and emotional component with the child's wish to finish the tower. The Tower also has the social component of taking turns with another person. That also makes it emotionally demanding for the child to not get frustrated with the slow progress and to wait for the other person to make his/her turn. The categorization of the HTKS into 'hot' or 'cool' however, is not quite as straight forward. One could argue that the HTKS has some motivational relevance to it as it resembles a game and the child wants to do well at it. However, as the task does not involve any feedback, the child does not necessarily know whether it is doing well or not. Also, the task does not hold a social component as the child is playing alone and carrying out the commands given by the test administrator. In this regard, it actually resembles more tasks such as the Wisconsin-Card Sorting Test (Heaton 1993) or the Stroop test (Jensen/Rohwer 1966;Stroop 1935).
What sets the HTKS apart from most measure, however, is the fact that the child has to act out a behavioural response rather than just pointing to a certain box or calling something out. Hence, it is called a behavioural measure by the authors. In contrast, most other EF measures named before fall into the category of cognitive EF measures (Bierman u.a. 2008;Pennington/Ozonoff 1996;Ritter u.a. 2014).

Rating scales
The term behavioural measure, however, is almost exclusively used when referring to rating scales. Rating scales, similar to measures of EF and SR that assess the constructs in more realistic ways, are sought to measure the extent to which certain behaviours or competences are shown in complex, everyday situations (Roth/Isquith/Gioia 2005). Some researchers argue that the use of rating scales is vital to gain insight into a child's executive and self-regulatory functioning (Isquith u.a. 2005). Ratings usually involve an informant reporting on the level of performance with carrying out everyday tasks related to EF (Miranda u.a. 2015). One of the most commonly used rating scales in the domain of EF has been the Behavior Rating Inventory of Executive Function (BRIEF; Gioia u.a. 2000). Therefore, the ecological validity of ratings such as the BRIEF may be higher than the ecological validity of neuro-psychological measures assessed in standardized test sessions (Anderson u.a. 2002). Several studies have shown that performance on cognitive EF tests does not always correspond to performance levels on behavioral measures and ratings (Anderson u.a. 2002;Ponitz u.a. 2009;Vriezen/Pigott 2002). A review by Toplak/West/ Stanovich (2013) for example based on 20 studies showed that out of the 286 correlations between performance-based and rating measures of EF, only 68 (24%) were statistically significant with a low overall median correlation of only r = .19.
To summarize, different methods to operationalize executive and self-regulatory competences in children exist. They can be distinguished into performance-based measures assessed at child level and teacher or parent ratings. Regarding the construct targeted, performance-based measures and ratings can aim at measuring the subcomponents of EF individually (i.e. working memory, inhibitory control, cognitive flexibility) or in an integrative way. Performance-based measures can differ in their resemblance of real-life situa-tions and thereby putatively in their ecological validity. They also show differences in their emotional, motivational and social valence as well as in the behavioural level of the participant`s response.

Research Questions
The present study assesses the relations between different measures of EF, SR and SC. By that, it attempts to answer three research questions: (1) To what extend are teacher ratings of SC associated with cognitive measures assessing EF in isolation and with integrative rather behavioural EF and SR measures? It was hypothesized that all tasks would be positively and significantly related to each other. However, it was assumed that integrative behavioural EF and SR and cognitive EF measures show higher correlations among each other. Integrated EF and SR tasks are also expected to correlate higher with teacher ratings than distinct cognitive EF tasks. (2) Is the distinction into integrative behavioural EF and SR and cognitive EF measures supported by a principal components analysis (PCA)? It was hypothesized that the distinction into the two dimensions based on the theory behind the measures involved will be supported by the PCA, meaning that two factors will be found. (3) Do behavioural measures or cognitive measures of EF predict teacher ratings of self-control and thoughtfulness better? It was hypothesized that a multiple regression would show that behavioural measures explain more variance of teacher ratings than cognitive EF measures.

Participants
The data used for the following analyses were gathered as the pre-assessment evaluating the intervention study 'EMIL', a program to improve self-regulation in pre-school children. In total, 217 children between 34 and 72 months (54% boys, M age = 53 months, SD age = 10.63) participated.
Children were nested in eight pre-schools (range: 15-47 children in each school) located on the outskirts of a middle-sized German city in the state of Baden-Württemberg. All eight pre-schools involved in the study worked according to an open concept called 'infans' that does not involve fixed class rooms but different learning areas (Andres/Laewen 2011). Each child is assigned to one teacher as the main care taker, who is responsible for the child's adaption when entering pre-school at age three, keeping track of the child's development and communicating with the parents.

Instruments
The measures included a battery of cognitive and behavioural tasks as well as teacher ratings of constructs related to self-regulation. The distinction into cognitive EF and behavioural EF and SR measures is based on the theory behind the tasks provided by the authors.
Digit Span backward (Petermann/Petermann 2008). The Digit Span backward test assesses phonological working memory. The test requires the child to repeat a sequence of digits in backward serial order. Lists of the digits one to nine were read out aloud by the test administrator at the rate of one digit per second. Following a short practice session, the test administrator read out a maximum of four lists of each length starting with two digits. List length was increased by one digit when the child recalled three lists of the same length correctly. Testing continued until the child recalled two lists of one length incorrectly. The number of lists correctly recalled is scored (max. six points).
Block Recall (Gathercole u.a. 2004). The Block Recall test assesses visual-spatial memory. It makes use of a plate with nine little blocks. The test administrator taps the blocks with a thin stick in a certain order. The child's task is to remember the sequence to tap the blocks in the same order. The test administrator taps a maximum of three blocks of each length starting with one block. The tapping sequence is increased by one if the child recalled two sequences of the same length correctly. Children could obtain a maximum of 21 points. (Diamond u.a. 2007;Gerstadt/Hong/Diamond 1994). This measure assesses inhibitory control. In this version, adapted from Diamond u.a. (2007), the test administrator showed pictures displayed in a computer screen that showed either a yellow sun on a white background or a yellow moon on a dark blue background. The children were first asked to react verbally to the pictures by saying 'sun' when the picture of the sun was shown or 'moon' when the picture of the moon was shown. Note that the response words differed from the original tests where the children react to the picture of the moon by saying 'day' and to the picture of the sun by saying 'night'. The decision to use different responses than in the original task was made to heighten the level of difficulty as children older than five years often show ceiling effects. After one test trial, the rules were changed. Now the children were asked to react by saying 'sun' when the picture of the moon was shown and by saying 'moon' when the picture of the sun was shown. Hence, they had to suppress their tendency to name what was displayed on the computer screen and instead name something else. As soon as the child responded, the test administrator pressed a button so the next picture was displayed, making sure the children have to respond only verbally and no fine or gross movements were necessary. 16 pictures were displayed in a fixed order. The children received two points for a correct response, one point for a self-corrected response and zero points for an incorrect response. Therefore, a maximum of 32 points could be obtained. (Diamond u.a. 2007;Shing u.a. 2010). The Hearts & Flowers test assesses all three EF components, inhibitory control, working memory and cognitive flexibility. It requires the child to react as fast as possible to a stimulus (red heart or blue flower) presented on a computer screen according to two rules. Depending on the stimulus (red heart or blue flower) and the side of the screen that it appears on (left or right), the child has to press either the left or the right out of two buttons on a small keyboard. In the congruent condition (40 trials), the child is presented with hearts only. The child is in-structed 'to press the button on the same side as the heart'. The incongruent condition (40 trials) consists of blue flowers only. It requires the children 'to press the button on the side opposite the flower'. In the mixed condition (40 trials), red hearts and blue flowers appear in a random order on the screen needing the children to apply the rules of the congruent and the incongruent condition flexibly. Children were given up to seven seconds to respond. As soon as the child pressed a button, the next stimulus appeared on the screen. If they responded to fast (in less than 0.4 seconds) the response was not taken into account. Children received a point for every correct respond (max. 120 points).

Assessment of behavioural executive functions and self-regulation
Two measures were used to assess behavioural EF and SR, Tower and Head-Toes-Knees-Shoulders (HTKS).
Tower (adapted from Kochanska u.a. 1996). For the Tower task, the child was asked to take turns with the test administrator to build a tower out of 15 building blocks. After a brief demonstration of turn-taking, the test administrator began building the tower by lying down the first block. After the child took its turn, the test administrator waited with his/her next turn until the child communicated that it was his/her turn again (e.g. verbally, by handing him/her a block or by waiting). Hence, children had to apply their EF to be successful: They had to resist their urge to place the next block when the response of the test administrator was delayed. After the first tower was erected, the test administrator asked the child to build a second tower with him/her using the blocks of the first. Children received a point for each block that placed correctly either by themselves or by the test administrator. The sum of the blocks placed correctly on both trials was used for analyses.
Head-Toes-Knees-Shoulders (HTKS ;Ponitz u.a. 2008;Ponitz u.a. 2009). The extended version of the HTKS was used to measure behavioural SR. The task requires all three components of EF, inhibitory control, working memory and cognitive flexibility. In the test, children are asked to perform the opposite of a dominant response to four different oral commands. In the first section, when asked to touch their head, they have to touch their toes and vice versa. In the second section, two new rules are introduced in addition to the two of the first section. Now, when asked to touch their shoulders, they have to touch their knees and vice versa. In the third section, four new rules are introduced that replace the ones of the two sections before. Now, when asked to touch their head, they have to touch their knees and vice versa. When asked to touch their toes, they have to touch their shoulders and vice versa. The test was administered following the procedure described in Ponitz u.a. (2009) with each section consisting of ten test trials. Children received two points for every correct response, one point for a self-corrected response and zero points for a false response, leading to a maximum score of 60 points.

Teacher ratings of self-control and thoughtfulness
To assess behaviours associated with EF, teachers rated the dimension 'self-control/ thoughtfulness' of the observation scale 'Social-emotional well-being and resilience of children in early childhood settings' (PERiK) (Mayr/Ulich 2006. Two examples of items are 'The child waits until it is his/her turn, for example in group discussions, when handing out materials or food' or 'The child can respect the limits set by other children'. The teacher rating is also regarded as a behavioural assessment form as it is based on the observed behaviour shown by the child in the pre-school setting. Although the testing sessions took part at pre-school, teachers were not present during the assessment of the per-formance-based measures of the children. Therefore, teacher ratings were not influenced by their performance.

Background variables
Background variables of all participants were administered using a caregiver questionnaire on parental education level, migration history and family income.

Procedure
After receiving consent of their care givers, the children were seen on two separate occasions in the autumn of the pre-school year. They were administered individually in two one-on-one sessions in a quiet room at their pre-school by research assistants trained in psychology or educational science. The two testing sessions took part within two weeks of each other. Each session lasted about 25 minutes. The tests were administered in a standard order as there was no reason to expect order effects. During Session A, children completed the Tower, Block Recall and the HTKS as well as an interview assessing their social relations to their classmates, which was not analysed for the present study. During Session B, children performed the Day-Night Stroop, Hearts & Flowers, and Digit Span backward. After their assessment, each child received a sticker or a colouring picture. The children's pre-school teachers completed the observation scale within two weeks of the testing session. Parents received the questionnaire on paper via their pre-school within two weeks of their child's testing and were given two weeks to send the questionnaire back in an addressed and stamped envelope.

Missing data
A number of variables used in the current analyses had missing data (Table 1). For the cognitive EF measures, Digit Span backward data were missing for ten children, Block Recall data were missing for five children, Day-Night Stroop data were missing for eight children, and Hearts & Flowers data were missing for 21 children. Of the integrative behavioural EF and SR measures, Tower data were missing for six children and HTKS data for ten children. The teacher ratings of SC and thoughtfulness were missing for 14 children. Primary reason for missing data was child refusal to participate in the particular assessment. The high rate of refusals on the Hearts & Flowers was believed to be due to fatigue or boredom on the part of the child, as this was the longest and most monotonous task in the session. The rate of return for the questionnaire on socio-demographic variables was 86.6%. Data were missing for 29 of the 217 children. Data on maternal education was missing for 32 children, on paternal education for 36 children. Data on first language of the child was missing for 29 children. Highest proportion of data was missing for family income (65 children), probably due to the sensitivity of the question. Data were assumed to be missing completely at random (MCAR). Little's (1988) Missing Completely at Random (MCAR) test performed with IBM SPSS Statistics 22 (IBM Corp., 2013) failed to reach significance, suggesting that the data are indeed missing at random (χ² (1, N = 217) = 91.09). Missing data were not imputed.

Analytic plan
IBM SPSS Statistics 22 (IBM Corp., 2013) was used to obtain descriptive statistics, analyse missing data, and perform data analyses. To address the first research question, Pearson's bi-variate as well as partial correlations controlling for age were run among all variables in the study to investigate the relationship between background variables, cognitive and behavioural measures. For the second research questions, a principal component analysis (PCA) was performed to find support for the distinction between behavioural and cognitive measures. To answer the third research question, whether behavioural measures of EF predict teacher ratings of self-control and thoughtfulness better than cognitive measures of EF, stepwise multiple regressions were carried out.
Maternal and paternal education, family income, child gender, child age and school were included as covariates, as these factors have shown to relate significantly to performance on behavioural and cognitive EF tasks (Becker u.a. 2014;Evans/Rosenbaum 2008;Matthews u.a. 2009;Wanless u.a. 2011).

Preliminary analyses
Analyses were based on data from 217 children. Descriptive statistics of socio-economic variables are presented in Table 2. About 53.9% of the participants were male. Although about 31% of the children had a history of migration, only 9.6% of the children did not speak German as their first language. Descriptive statistics of all measures are presented in Table 1. All measures showed a good range in performance with the exception of the Digit Span backward, where floor effects were found. There were no age differences between boys and girls (girls: M age = 52.90 months, SD age : 10.74, boys: M age = 52.59, SD age = 10.58; p = .83). No significant performance differences were found between boys and girls across the measures, with one exception: Girls obtained higher scores on Tower than boys, t(209) = 3.56, p < .01.

Correlations between measures of executive functions, selfregulation and self-control
Pearson's bi-variate correlations were run among all variables in the study to investigate the relationships between background variables, socio-economic status and cognitive and behavioural measures (see Table 3). Child age was highly correlated to all cognitive EF measures (rs ranging from .51 to .64, ps < .001). Correlations between child age and behavioural SR were in general somewhat lower, rs ranging from .22 to .64 (ps < .01). Maternal education was positively related to paternal education (r = .42, p < .001). Both maternal and paternal education were positively correlated to family income (r = .51 and r = .52, respectively, both ps < .001). Background variables also showed significant correlations with several EF measures and teacher ratings. Maternal education showed low but significant positive correlations with performance on Hearts & Flowers (r = .17, p < .05), Digit Span backward (r = .22, p < .01) and teacher ratings of self-control (r = .16, p < .05). Paternal education only showed significant positive correlation with Digit Span backward (r = .17, p < .05). Note that family income was not related to EF measures. Correlations between EF, SR and SC measures are also presented in Table 3. .62*** .51*** .67*** .64*** .37*** ---11. TR Self-control/ thoughtfulness .22** .16* -.02 .11 .32*** .17* .27*** .26*** .27*** .21** Note: CEF = cognitive executive functions measure, BEF = behavioural executive functions measure, BSR = behavioural self-regulation measure, TR = teacher rating *p < .05, **p < .01, ***p < .001 However, due to the high correlations with age, partial correlations controlling for child's age were run for all measures (see Table 4). In the following, partial correlations between measures controlling for age will be presented. When controlling for age, most correlations decrease in height. Between some measures, significant correlations vanished, i.e. between Day-Night Stroop and Digit Span backward as well as between Tower and Digit Span backward. For all but for one measure (Day-Night Stroop and Digit Span backward) significant correlations were found. Significant correlations ranged between .18 and .42. Measures tapping the same EF skill showed higher correlations with one another than with measures tapping different skills. For example, both measures assessing working memory, Block Recall and Digit Span backward, although the former assessing visualspatial sketchpad and the latter the phonological lope, were moderately correlated (r = .39, p = .000). The Hearts & Flowers, tapping all three components of EF, showed moderate correlations with all measures (rs ranging from .30 to .42, ps < .01). Note: CEF = cognitive executive functions measure, BEF = behavioural executive functions measure, BSR = behavioural self-regulation measure, TR = teacher rating *p < .05, **p < .01, ***p < .001 Correlation coefficients between the two behavioural measures, Tower and HTKS, reached .23 (p = .01). Therefore, relations among direct assessments of EF were on average higher than between the two behavioural measures. Significant relations were found between most cognitive and the two behavioural measures. No significant correlation was found between Digit Span backward and Tower (r = .12, p = .19). Correlations between cognitive EF measures and HTKS were on average substantially higher (rs from .21 to .50, all ps < .000) than correlations with Tower (rs from .19 to .37, all ps < .05). For teacher rating, significant correlations were found with three out of four cognitive EF measures: Block Recall (r = .21, p = .01), Digit Span backward (r = .16, p = .03) and Hearts & Flowers (r = .15, p = .04). A low but significant correlation was found between the Tower and teacher rating (r = .20, p = .01). The correlation between the other behavioural measure, HTKS, and the teacher rating did not reach significance (r = .07, p = .38).

Distinction between cognitive and behavioural measures
In order to examine the underlying components of the cognitive and behavioural measures of EF related constructs, a principal components analysis (PCA) was carried out. All described measures were included in the analysis. The first unrotated principal component (FUPC) accounts for the maximum amount of variance of the measured variables. Item loadings reflect the correlation between a particular measure and the overall component. A relatively high Kaiser-Meyer-Olkin measure of sampling adequacy (KMO =.69) confirmed the validity of using a factor analysis for structure detection. Loadings of .30 or above are typically considered acceptable (Tabachnick/Fidell 1983). All measures had loadings above this threshold on the FUPC, as shown in Table 5.
Principal components analysis (PCA) using direct oblimin rotation extracted two major components. Together they accounted for 61.8% of the variance. All factors of an eigenvalue higher than 1 were selected. Scree plots were examined to confirm factor selections. Measures with factor coefficients higher than .5 were considered to load on a certain factor. The first principal component identified accounted for 47.91% of the variance and included the performance on Block Recall, Day-Night Stroop, Hearts & Flowers, Digits Span and HTKS. The second principal component accounted for 13.86% of the variance and included Tower and teacher rating of 'self-control/thoughtfulness'. Note: Component loadings >.5 are presented. CEF = cognitive executive functions measure, BEF = behavioural executive functions measure, BSR = behavioural self-regulation measure, TR = teacher rating

Predicting teacher rating of self-control and thoughtfulness
Results of the correlational analyses indicated that all cognitive and behavioural tasks show significant but low correlations with teacher ratings of self-control and thoughtfulness. To test their predictive value of teacher rating, a Hierarchical Multiple Regression was carried out. We controlled for the effects of the covariates child age, parental education and migration history by entering them into the first model. The second model consisted of all measures of EF and SR. Parameter estimates are presented in Table 6. Only Block Recall (β = .24, B = .29, p = .032) was significantly associated with teacher rating after adjusting for the covariates child age, child gender, parental education and migration history. Additionally, performance on Tower was marginally associated with teacher rating (β = .15, B = .11, p = .063). The covariates explained 6.4% of the variance in teacher rating of self-control and thoughtfulness. Measures of EF and SR explained an additional 9.3% of the variance. For a better distinction between the numerous existent measures, measures are sorted into various categories such as performance-based measures and behaviour ratings. Within the category of performance-based measures, finer distinctions exist, e.g. into cognitive or behavioural (McClelland/Cameron/Connor u.a. 2007;McClelland/Cameron/Wanless u.a. 2007;Ponitz u.a. 2009) andinto hot or cool measures (e.g. Hongwanishkul u.a. 2005;Zelazo/Carlson 2012).
The aim of the study was to explore the validity of the putative distinction into cognitive and behavioural measures of EF, SR and SC. It was investigated whether measures of the same category really tap similar constructs. Also, we wanted to find out what measures are related closest to behaviour of preschoolers observed by their teachers.
Six cognitive and behavioural EF and SR measures as well as teacher ratings of children's SC and thoughtfulness were used. Performance-based measures were assessed in two testing sessions. It has to be noted that both behavioural measures (Tower and HTKS) were administered within the same testing session. Hence, it could be argued that tasks administered within the same session could correlate higher due to similar levels of motivation or concentration.
All measures were suitable for the age range in question except for the Digit Span backward, which showed to be quite challenging especially for younger children. Gender differences became evident in one measure, the Tower task. It has been found before that girls outperform boys on EF and SR tasks, especially in this age group (e.g. Tower also consists of a strong social component: taking turns and keeping the partner in mind. Studies have shown that girls do not only outperform boys on EF and SR tasks but regarding their social-emotional competences (e.g. Denham u.a. 2003;Walker 2005).
Parental education showed significant relations with just one out of seven EF tasks and family income with none. Several studies have found that family income is also related to children's performance on EF task (Evans/Rosenbaum 2008). This finding was not supported by the present study.
Most measures were significantly correlated to each other. Some cognitive EF measures did not show significant correlations, e.g. Day-Night Stroop and Digit Span backward. However, since the two measures tap different components of EF (e.g. inhibitory control and working memory) that finding is not really astonishing. It was hypothesized that correlations among the behavioural EF and SR measures and teacher ratings are higher than their correlations with the cognitive EF measures. Conversely, profound differences were found in the strengths of correlations between the different measures. The behavioural SR measure HTKS correlated highly with cognitive EF measures, much higher than with the second behavioural measure, Tower, and the teacher rating of SC and thoughtfulness. Therefore, the first hypothesis was not supported by the data. The unexpected result might be due to the fact, that the HTKS addresses all three EF components (Ponitz u.a. 2008;Ponitz u.a. 2009) whereas the Tower mainly addresses inhibitory control (Kochanska u.a. 1996). Although both tasks require the child to react by carrying out responses behaviourally, the Tower may resemble a real-life situation even better than the HTKS and therefore may be higher in its ecological validity than the HTKS. Its significant correlation with the teacher rating which was not evident for the HTKS supports this assumption.
The principal component analysis supported the results of the correlational analysis. The PCA showed no evidence for the distinction of the measures into cognitive and behavioural stated by the authors. Two distinct components, cognitive measures and behavioural measures, were identified. The first component consisted of all cognitive measures and the HTKS. The second component consisted of Tower and the teacher rating. It was hypothesized that the HTKS as a behavioural SR measure would fall into the second category. Based on the results it may be assumed that the HTKS is more closely related to cognitive than to other behavioural measures.
The question that arises is: What makes a measure behavioural in nature? The authors of the HTKS claim that it measures behavioural self-regulation, although performance seems to rely strongly on the developmental level of the three components of EF that are mentioned explicitly in the task-description (Ponitz u.a. 2008;Ponitz u.a. 2009). The Tower on the other hand assesses the ability to suppress and initiate activity to signal (Kochanska u.a. 1996). In comparison to the HTKS, which taps all three EF components, the component of inhibitory control could be argued to be most central within the Tower as the child has to inhibit itself to lay a block when it's the turn of the test administrator. The child is also required to hold the rule in mind to take turns with placing a block. However, the rule is never changed as it is the case in the HTKS. Therefore, the task also relies only for a small part on working memory capacity.
Looking at the testing situations of both tasks, it becomes evident that the Tower resembles the typical play situation that children experience frequently in pre-school and at home very well. It becomes evident that the Tower has a much stronger social component than the HTKS as the child has to be considered of the test administrator in order to per-form well. The teacher rating scale that was used has a strong social component too, which could be the common factor that could explain the outcomes of the PCA. In comparison, the HTKS, although it can be regarded as a rather playful measure, is less of a mutual play situation due to the fact that the child is required to act upon the commands given by the test administrator. In fact, this is the only factor that distinguishes the HTKS from all the cognitive EF tasks: The child is required to act out its response by reaching towards a certain body part. Apart from that, the task is very similar to the Hearts & Flowers for example, were the child has to respond as fast and as correct as possible, just not with gross movements but with a fine movement (pressing a button). The HTKS again requires the child to be perseverant and to stay on task to perform well.
Therefore, high correlations between the HTKS and cognitive EF measures might be due to the fact that all of them have a common denominator: perseverance and focus on task. Both factors are thought to be important for academic success but not necessarily for social behaviour (Rhoades/Greenberg/Domitrovich 2009). A study by Ponitz u.a. (2009) showed that HTKS is a valid predictor of academic but not of social skills of children in primary school. Performance on the HTKS could predict literacy and mathematical skills half a year later, but not their interpersonal competences. On the other hand, inhibitory control, which plays an important role in the Tower task, has shown to be very important for social-emotional competence (Kochanska/Murray/Harlan 2000;Rhoades/Greenberg/ Domitrovich 2009).
The third aim of the study was the predictive value of EF and SR measures when it comes to everyday behaviour. A teacher rating of children's self-control and thoughtfulness was included. One cognitive EF measure of visual-spatial working memory explained a significant proportion of the variance in children's SC and thoughtfulness. One measure of behavioural EF, Tower, was marginally significant. The significant contribution of working memory was quite unexpected. Based on the literature, it was hypothesized that behavioural measures will explain more variance than cognitive measures. Also, out of the three EF components, inhibitory control would be the one closest associated with the rating scale assessing SC and thoughtfulness. Therefore, a cognitive EF measure assessing inhibitory control would have been most likely. The marginal significance of the Tower was therefore much more expected.
Is a distinction between cognitive EF and behavioural SR measures useful? Based on the findings of this study, the distinction of measures into different subcategories (i.e. behavioural and cognitive) claimed by the authors has to be handled with care. Especially when chosen with the aim of ensuring ecological validity of the assessment. More refined categories of measures than merely the distinction between performance-based measure or ratings, EF and SR task or cognitive and behavioural measure could be of good use when aiming at assessing EF related constructs in a comprehensive manner. However, each category has to be defined more clearly so the measures are distinguishable. Regarding the category of behavioural measures for example, it needs to be discussed whether a task in which the response has to be acted out rather than be submitted by an utterance or the press of a button already qualifies as behavioural as it is argued by the authors (Ponitz u.a. 2008;Ponitz u.a. 2009). Moreover, it should also be considered to distinguish between measures that require social interaction and that do not. Interventions for example that aim at improving the application of EF or self-regulatory competences in school setting would benefit of a more distinctive labelling of measures that assess EF related behaviour in academic as well as in social contexts.