Adolescent risk-taking in the context of exploration and social influence

Adolescents are often described as a strange and different species that behaves like no other age group, typical behaviours being excessive risk-taking and sensitivity to peer influence. Different theories of adolescent behaviour attribute this to different internal mechanisms like undeveloped cognitive control, higher sensation-seeking or extraordinary social motivation. Many agree that some of adolescent risk-taking behaviour is adaptive. Here we argue that to understand adolescent risk-taking, and why it may be adaptive, research needs to pay attention to the adolescent environments’ structure and view adolescents as learning and exploring agents in it. We identify three unique aspects of the adolescent environment: 1) the opportunities to take risks are increased significantly, 2) these opportunities are novel and their outcomes uncertain, and 3) peers become more important. Next, we illustrate how adolescent risk-taking may emerge from learning using agent-based modelling, and show that a typical inverted-U shape in risk-taking may emerge in absence of a specific adolescent motivational drive for sensation-seeking or sensitivity to social information. The simulations also show how risky exploration may be necessary for adolescents to gain long-term benefits in later developmental stages and that social learning can help reduce losses. Finally, we discuss how a renewed ecological perspective and the focus on adolescents as learning agents may shift the interpretation of current findings and inspire


General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).
Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons.In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website.Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands.You will be contacted as soon as possible.

Introduction
A time-honoured view of adolescence, often defined as the period between ages 10 and 21 (van Duijvenvoorde et al., 2016), is that as a period of trials and tribulations (Sturm und Drang) on the way to adulthood (Hall, 1904).Consistent with this view, adolescence is associated with a peak in risky behaviours such as reckless driving, crime, binge drinking, unprotected sex, and experimenting with drugs (Gullone et al., 2000;Johnston et al., 2014;Shulman et al., 2013;Steinberg et al., 2018).Typically, these behaviours occur in the presence or presumed influence of peers (Albert et al., 2013;Monahan et al., 2009;van Duijvenvoorde et al., 2016).Although it is often stressed that while adolescent risk-taking has detrimental side-effects, part of these behaviours may serve some adaptive function, how exactly, however, remains unclear.Here we argue that if we aim to understand how risk-taking and peer influence can be adaptive, we need to (1) put more emphasis on understanding the interaction between learning, exploration and risk behaviour, and (2) better understand the interaction between adolescents and their environment.The first point builds on the idea that taking a risk can lead to meaningful experiences that will be beneficial in a later developmental stage (Baumrind, 1987;Romer et al., 2017;van den Bos et al., 2019).The second point is vital because behaviour can only be adaptive in relation to the environment in which it occurs (Simon, 1956).We believe that such a broader perspective will enrich the general understanding of adolescent behaviour.We begin by reviewing how risk-taking is defined in the adolescent literature, indicating a distinction between impulsive and planned risk-taking, and a shift of focus from merely harmful behaviour to one on exploration and learning.Next, we identify some key factors that characterize the environment which adolescents have to explore.Finally, we implement such an environment and run an agent-based learning model providing evidence that "typical" adolescent peak in risky behaviour may emerge from the interaction of an exploring agent and the environment in the absence of adolescent-specific motivational drives for reward or social feedback.In addition, our simulations indicate that under certain circumstances, both risk-taking and social influence can have long-term benefits, even though there are also negative outcomes in the short term.

Risk, uncertainty and exploration
Risk-taking does not refer to a well-defined set of actions (Frey et al., 2017).Risk-taking is also not necessarily illegal or dangerous.Instead, taking a risk is taking an action for which the outcome is uncertain, and potential consequences can be both positive or negative (Hertwig et al., 2019).Based on this definition, at least two types of risk-taking can be distinguished according to their differences in underlying motivational mechanisms: reactive and reasoned risk-taking.The first type, reactive risk, explains adolescent risk behaviour due to the combination between poor response inhibition and increased reward sensitivity (Rosenbaum et al., 2018;Shulman et al., 2016;Steinberg, 2008aSteinberg, , 2008b)).According to several adolescent risk-taking frameworks, this mismatch is due to an adolescent-specific imbalance between neural systems that support cognitive control and those that support reward processing (Ernst et al., 2005;Luna et al., 2013;Shulman et al., 2016).Although there is no doubt that some of the typical reckless adolescent behaviour falls in this category, and that this can lead to some undesirable outcomes, more recently it has been suggested that a significant proportion of adolescent behaviour comprises reasoned risk behaviour.Reasoned risk is strategic, planned well in advance, and relies on increasing cognitive control capacity in combination with an increased drive towards sensation seeking (Romer et al., 2017).Along these lines, a recent study reported that risk behaviour was associated not only with higher levels of sensation seeking but also with better working memory and greater future orientation (Maslowsky et al., 2019).
Furthermore, adolescents self-reported perceived risk (e.g.how risky is binge drinking?), is negatively correlated with their engagement in risky behaviours (Ciranka & Bos, 2021;Johnston et al., 2014).This suggests that adolescents consider costs and benefits before engaging in risky behaviour.What unites many current theories of adolescent risk behaviour is the assumption that it can be adaptive.It is hypothesized that taking a risk can generate meaningful experiences enabling adolescents to interact with their future environment beneficially, and help adolescents to explore and learn about the world and themselves (Baumrind, 1987;Crone & Dahl, 2012;Rodman et al., 2017;Romer et al., 2017;Telzer, 2016;van den Bos et al., 2019;Worthman & Trang, 2018).Indeed, the adaptive potential of adolescent risk-taking behaviour becomes apparent when considering the developmental tasks and environment which adolescents are facing.Adolescents have to learn how to set up an independent household, become economically self-sufficient, emotionally stable, find their place in novel peer groups, build their own identity and eventually establishing a family unit of their own.In other words, adolescents developmental task is to become an independent adult (Nelson et al., 2016).On the way to adulthood, adolescents could hardly succeed if they would not take the risk to "leave their nest" (Bowers & Natterson-Horowitz, 2020).Such a notion dovetails with the general definition of risk-taking as "taking an action where outcomes are uncertain and could both be harmful or beneficial".As such, part of adolescent risk-taking might be re-cast, from simply doing something potentially harmful to a more goal-directed act of exploration.Taking the risk to explore novel environments may lead adolescents to discover new niches and learn about novel opportunities (Sercombe, 2014;Willoughby et al., 2013).This perspective also suggests that when a child enters the world of adolescence, and much is unknown, exploration has high benefits, but these benefits will inevitably decline as a function of learning.In other words, exploration-based risk-taking will introduce a sudden increase in risk-taking that decreases again towards adulthood leading to an adolescent peak in risky behaviour.

Peer influence and social learning
The adolescent peak in risk-taking is often attributed to an adolescent-specific response to their peers.Some theories emphasize that peer presence is especially arousing for adolescents (Gardner & Steinberg, 2005).This arousal leads adolescents to focus on rewards, resulting in impulsive decisions and risk-taking (Albert et al., 2013;Shulman et al., 2016).For instance, some studies suggest that the general arousal associated with peer presence makes adolescents drive riskier in a driving simulation (Chein et al., 2011;Gardner & Steinberg, 2005).On the other hand, adolescents may only show increased risk-taking behaviour when they believe that their peers expect them to drive aggressivelysuggesting a form of reasoned risk-taking (Blakemore & Mills, 2014;Romer et al., 2014).According to this view, peer influence is more in line with planned risk-taking because it might be the result of a cost-benefit analysis in which one specifically considers the social benefits, for instance gaining status or belonging to a group (Blakemore, 2018;Blakemore, this issue;Cialdini & Goldstein, 2004;Ciranka & van den Bos, 2019;Yeager et al., 2018).
Yet there is a third perspective toward social influence, currently underrepresented in the adolescent literature: social influence comprises social learning which can increase ones' confidence about how to make decisions in a complex and uncertain environment (FeldmanHall & Shenhav, 2019;Gigerenzer & Gaissmaier, 2011;Morgan & Laland, 2012;Morgan et al., 2015;Toelch & Dolan, 2015).For instance, observing others may entail information about which actions are more or less likely to lead to rewards.When faced with a S. Ciranka and W. van den Bos novel and uncertain environment, adopting others' behaviour can be beneficial (Chase et al., 1998;Mehlhorn et al., 2015) because it protects the individual from potentially costly trial and error learning (FeldmanHall et al., 2017;Molleman et al., 2019;Molleman et al., 2014).Several empirical studies in adults (Behrens et al., 2008;Biele et al., 2011;Ciranka & van den Bos, 2020;Toyokawa et al., 2019) and children (Bandura, 1962;Morgan et al., 2015;Walden & Ogan, 1988;Zarbatany & Lamb, 1985) showed that when people are more uncertain, they use social information more.From a developmental perspective, this suggests that a life phase associated with novelty and uncertainty, like adolescence, will also be associated with more social information use.

Learning and the environment
To date, learning and experience do not play a significant role in many existing theories on adolescent risk-taking.Exceptions are fuzzy trace theory (Rivers et al., 2008) and the Life Span Wisdom Model (Romer et al., 2017).However, both are neither explicit nor formal models of learning.A formalism would aid in generating expectations about how experience and knowledge will impact future behaviour and how a normative learning process might look like across development.
To further our intuitions about how risky behaviour, specifically its developmental rise and fall, may emerge from the interaction between adolescents and their environment, we turn to the formal framework of reinforcement learning (Sutton & Barto, 2018).A reinforcement learning agent must learn to make good decisions in an uncertain environment by interacting with it (Collins & Cockburn, 2020), much like adolescents do (Davidow et al., 2018).Good decisions are those that reap the most long-term rewards for the agent in a given environment.Eventually, the agents' behaviour will be optimally adapted to their experience with their environment (Fig. 1).
Evidence suggests that learning to interact with novel environments can be characterized by some form of Bayesian learning (Daw et al., 2005(Daw et al., , 2006;;Dayan & Daw, 2008;Knill & Pouget, 2004;Marković & Kiebel, 2016;Mathys et al., 2011;Nassar et al., 2010).In Bayesian learning, in contrast to classic reinforcement learning, learning occurs faster when the learner experiences more uncertainty.Because of this feature it was argued that Bayesian models resemble how individuals assimilate new information into their beliefs across development (Frankenhuis & Panchanathan, 2011;Gopnik et al., 2017;Stamps & Frankenhuis, 2016;Tenenbaum et al., 2011) From this point of view, many developmental tasks are about navigating a complex and uncertain environment in order to find a good solution based on experience.Such a task presents the learner with an explore-exploit dilemma (Addicott et al., 2017;Gopnik, 2020).Too much exploitation (choosing known good options) prevents the learner from gathering new information, and thus one may miss out on even more rewarding options.Too much exploration (gathering novel information), may be inefficient because of the high opportunity cost associated with not sampling the best option, thereby reducing long-term prospects.
The most widely used paradigm to study this dilemma is the multi-armed bandit task (Daw et al., 2006).It mirrors a casino's slotmachine with multiple arms, where each arm is associated with a different reward distribution.Obtaining a reliable understanding of all possible rewards will require vast amounts of exploration; requiring time or resources that could be spent pulling the most optimal arm, hence the dilemma.How much exploration is rational not only depends on individual experience but also on the environment which learners find themselves in.For instance, in a novel or volatile environment, more exploration is beneficial, while in a stable and well-known world, exploitation becomes more attractive (Behrens et al., 2007;Mathys et al., 2011).When there is considerable uncertainty, social information can reduce the need for exploration by providing information about what should be exploited or avoided by simply observing others (Mehlhorn et al., 2015).
Studying learning of artificial agents provides a laboratory for understanding the dynamics in human learning as well (Gershman et al., 2015;Rahwan et al., 2019).Since reinforcement learning formalizes the interaction between agent and environment, we however need to zoom in on some specific aspects of the adolescent environment in order to generate a meaningful metaphor for adolescent development.First, parental oversight is decreasing when adolescence begins, and the opportunities for engaging in risky behaviour increase (Defoe et al., 2019;Sercombe, 2014;Willoughby et al., 2013, Willoughby et al., this issue).Second, these opportunities have often not been explored before, making their benefits uncertain and because losses are possible, exploring those opportunities carries a risk (Hertwig et al., 2019).Third, there are significant changes in the social world (Blakemore & Mills, 2014).
In the following, we focus on these three salient features of the adolescent environment: 1) increasing opportunity for risks, 2) uncertainty about the world and 3) the presence of peers, and implement Bayesian learning agents who explore this environment.These agents possess a simple set of rules according to which they act, but viewing their behaviour simultaneously and over time will unravel complex properties, beyond the simple decision rules of one individual at one point in time (Bonabeau, 2002).By these means, we show how exploration and learning, together with changes in the environment, can lead to outcomes that resemble developmental trajectories of risk-taking and social susceptibility observed in adolescents, without assuming developmental changes in reward or social sensitivity.

The simulated environment
We simulate an environment, carrying three features of the environment adolescents face to show that adolescent-specific riskybehaviour may emerge merely from learning and exploration.First, it increases in the number of options after an initial childhood learning period.Second, exploration is risky and could lead to gains but also losses.We assume that adolescents have access to more dangerous options (with more negative outcomes) than those provided in childhood (Baumrind, 1987;Defoe et al., 2015Defoe et al., , 2019;;Sercombe, 2014).Third, there is social information, meaning that similar agents explore the same environment simultaneously, and agents can observe each other.
The simulated environment confronts our learning agents with a multi-armed bandit problem, which is often used to study how humans trade-off exploration with exploitation (Daw et al., 2006;Schulz et al., 2019;Wu et al., 2018).The problem consists of 144 different options, each associated with another reward distribution (Fig. 2), that agents can explore to find an option that maximizes long term rewards.Every time an agent decides for one option, the environment produces an outcome: a random draw from a normal distribution.The outcome is either positive (gain) or negative (loss).Options differ in their expected reward (from − 100 to 100) and variance (from 5 to 80).By varying mean and variance of the options' underlying distribution, we generated an environment in which exploration is risky (Sani et al., 2012), according to the definition of risk-taking: sampling a novel option can result in losses, and there Fig. 2. The 144-armed bandit used for our simulations.Each square represents one possible outcome from its underlying reward distribution.Examples for the most extreme distributions in our environment are depicted in the margins, where the y axis shows the probability to receive the outcome on the x-axis.The environment offers different amounts of expected rewards (x-axis) to varying degrees of uncertainty (y-axis).The middle grid within the red square depicts a "child" agent's search space, constricted to medium-sized gains and losses with relatively low risk.The whole grid shows an "adolescent" agent's search space, where large gains and losses are possible at high and low levels of risk.

S. Ciranka and W. van den Bos
is uncertainty about the outcomes (Hertwig et al., 2019).The environments' complexity increases in two stages.In the first stage, agents can only explore a constrained section of their environment, with 36 options to sample.These options are relatively predictable (the variance ranges from 5 to 40) and avoid great losses and rewards (the mean ranges from − 50 to 50).This reflects a childhood period where adults strongly restrict the environment of children to keep them safe.In the second "adolescent" stage, we introduce novel options to explore.More precisely, all options in Fig. 2 become available for the agents to explore.This mirrors the increased riskexposure that adolescents likely face in the real-world (Defoe et al., 2019;Willoughby et al., 2013).Adolescents' options are both better and worse than those presented in the childhood world, echoing the risks and opportunities of adolescence (Dahl, 2004).We note that in reality, it is unlikely that every risky behaviour becomes available for exploration at the same time.For simplicity we chose not to implement such gradual "opening", because while it could make the simulations more realistic (see discussion) their qualitative patterns will remain unchanged.

Agents
Agents in our simulations make decisions under uncertainty (Hertwig et al., 2004).Thus unlike situations under risk or ambiguity, and more like real life (Gigerenzer & Gaissmaier, 2011;Gigerenzer & Sturm, 2012;Knight, 1921;Volz & Gigerenzer, 2012), outcomes and their probabilities are unknown to the agents before they explore their environment.We can break down the agents into three elements: beliefs, a belief update rule and a choice rule.When making a decision, agents can either take a risk and switch away from the best option they know so far, or they can exploit their knowledge base and stay with this option.In our implementation (Daw et al., 2006;Schulz et al., 2019), agents learn and update their beliefs about their environment's statistical structure using Bayes rule (Fig. 3).
At every point in time, t, agents have a specific belief about each option, j, in their environment.Namely that each option will result in some reward (μ), but are uncertain (σ 2 ), about their expectations (Fig. 3A).After making a decision, they update their beliefs after every decision with a prediction error.The prediction error is the difference between the reward expected by the agent(μ j,t− 1 ; Fig. 3A) and the actual reward received after deciding for an option (y j,t ; Fig. 3B).For a given option, the mean and uncertainty are updated when the option was selected (Fig. 3C): where ζ j,t = 1 if option j is chosen on trial t, and 0 otherwise.K refers to the Kalman gain, which can be interpreted as a learning rate and is defined by the agents' uncertainty on the previous trial: ∊is an error constant, denoting the range of outcomes expected in the whole environment.Notably, in an analogy to Bayesian models of human development (Frankenhuis & Panchanathan, 2011;Tenenbaum et al., 2011), when agents are more experienced they will change their beliefs about their environment less.We set the initial beliefs about unchosen options to be optimistic, but very uncertain (μ 0 = 100, σ 2 0 = 40).We choose relatively optimistic priors (the mean expected reward prior was 100, whereas the actual mean reward rate of the whole environment is 0), for two reasons: First, this motivates agents to leave the safe "childhood" space, given that the agents expect to find higher rewards outside of it.Second, a single negative outcome will not directly lead to a very negative belief about an option, thus inviting further exploration.

Social learning
To understand how social information shapes risky exploration, multiple agents could observe each other while solving the exploreexploit trade-off simultaneously.Each agent expects an option to be more rewarding when other agents also explore it: N is the total number of other agents exploring an option and α maps social mass N to social impact.In our simulations, we set this parameter to 0.8, which means that social impact increases strongly when a few individuals explore this option.When social mass gets bigger, one additional individual's impact declines (Latané, 1981).The "social bonus" is added to the observing agent's utility function (U) in the next round.Finally, all options' U t are fed into a softmax function to obtain the probability that an agent will choose the respective option, j We do not model developmental changes in the model parameters that govern exploration or social impact, these are assumed to be the same at each developmental stage.Thus, the model is in contrast with theories suggesting that adolescents are more sensitive to social information (Blakemore & Mills, 2014) or rewards (Steinberg, 2008a(Steinberg, , 2008b) ) than children and adults.It was our goal to show how typical adolescent behaviour may emerge simply by the interaction of experience and environmental changes.

Simulations
Each agent made 1200 sequential decisions, the first 400 of which in the "childhood environment", the other 800 in the "adolescent environment" (Fig. 2) after new options have been made available to the agents.We performed two sets of simulations.In a solo condition, agents explored the environment alone.In a social condition, 20 agents explore the environment simultaneously and influence each other according to equation (4).All simulations were performed in R (R core team, 2021) with strong reliance on the tidyverse (François et al., 2019).

Behavioural measures
To assess the change in risky behaviour across "development" we calculated the average number of explorative decisions made by our agents in bins of 50 consecutive choices.In the multi-armed bandit problem, exploration can mean switching from one arm to another (Daw et al., 2006) 1 .Exploring is associated with risk because some options in the environment carry the danger of losses, and there is uncertainty about when and if these options will lead to bad outcomes (Hertwig et al., 2019).To understand the consequences of exploration, we examine how many losses and gains the agents encountered and their average magnitude.Finally, we quantified social learning by calculating how often an agent samples options that other agents sampled previously, in other words how often the agent followed others (again, the average per 50 trials).Given that we were interested in the adolescent period and the transition into adolescence, there is no explicit third transition into an "adult environment".This over-simplification means that in the adult period, no new options are introduced and adults live in the same environment as adolescents (see Discussion).However, for illustration purposes, we analysed the behaviour of our agents within three equally sized bins (childhood, adolescence and adulthood), each corresponding to 400 choices made by the agents.

Exploration and social following
Here we investigate how explorative behaviour changes as a function of experience and the environment in agents who did and did not access social information.As shown in Fig. 4A, both childhood and adolescence are generally characterized by exploratory behaviour that declines with age.Within our simulations, this can be attributed to the many new options that became available Fig. 3. Belief update for the example of one option in the environment.A) An agent(The avatar image by LeonardoIannelliCOMPUTE is licensed under CC BY 4.0.)approaches each option in the environment with a prior belief about how rewarding this option could be, but is uncertain about it, as can be seen in the spread of the prior distribution.B) Exploring this option will produce outcomes which the agent experiences.C) Experiencing outcomes will help the agent adapt its belief to the environment, as the mean of the agent's beliefs shifts towards the observations and the agent is more certain as the posterior's spread is smaller than the spread of the prior.choices are seen as forms of exploration.It can be also argued that exploration is better defined by seeking to reduce uncertainty (cf.Wu et al., 2018).This would mean that sometimes staying with an option is still exploration.This definition requires to set some certainty threshold to separate exploration for exploitation.Using this operational definition of exploration -tallying uncertainty-seeking decisions (decisions over some threshold of σ)yields the same inverted-U pattern as the one reported here (see supplementary Fig. 1).
simultaneously.Such an adolescent peak and decline in exploration simply emerges from increased opportunities and subsequent learning in the absence of specific adolescent motivation for sensation-seeking or sensitivity to rewards.When agents had the opportunity for social learning, exploration was reduced as compared to solo behaviour during adolescence.We also see that there is substantial social following behaviour in childhood and another peak in adolescence (Fig. 4 B).As agents gain more experience, following others declines.In childhood, social information prolongs exploration and increases the variance over different simulations.In adolescence, where the search space is vaster and outcomes differ more strongly between each other, social information helps to find a good solution quicker and therefore decreases exploration (see Fig. 5).

Experienced outcomes
Here we show the number of positive and negative outcomes encountered by exploring agents in different "developmental stages" and show how severe those outcomes were.Overall, the number of experienced losses declines, the number of experienced rewards increases.
In both metrics, number of losses or gains and their magnitude, social information was beneficial, resulting in more gains and less losses as well as better outcomes for gains and losses (Fig. 5A and C).We further observe that adolescent agents experience the most severe losses (Fig. 5B) irrespective of whether social information was available or not, however on average using social information seemed beneficial for adolescent agents in the loss (Fig. 5C) and in the gain domain (Fig. 5D).

Discussion
In his book "The Sciences of the Artificial", Nobel laureate Herbert Simon contemplates on the trajectory of an ant wandering on the beach.Looking from above, the ant's path is "irregular, complex, and hard to describe".An apt description for many adolescents' choices.But, as Simon points out, the complexity is in the surface of the beach, not in the ant; "An ant, viewed as a behaving system, is quite simple.The apparent complexity of its behaviour over time is largely a reflection of the complexity of the environment in which it finds itself", (Simon, 2019, p.52).Here we argue that we also have to pay attention to interaction between the adolescent and their complex environment.We stress that 1) adolescents are required to learn to interact with the novel adult environment (Sercombe, 2014;Willoughby et al., 2013), and 2) exploring this environment is inherently risky since it is often uncertain whether the behaviour will be harmful or not (Hertwig et al., 2019;Sani et al., 2012) and 3) the environment is filled with opportunities for social learning (Crone & Dahl, 2012;Nelson et al., 2016).We illustrate how agent-based simulations help to further our intuitions about the interactions between adolescents and their environment.
One of the striking results is that our simulations reproduce a set of very typical adolescent behaviours; 1) an inverted U shape in risky exploratory behaviour (Romer et al., 2017;Steinberg et al., 2018), and 2) a similar inverted U shape in peer-following (Braams et al., 2019;Rodriguez Buritica et al., 2019).The increased exploration was associated with selecting risky high variance options, which resulted in a peak of severe losses during adolescence.However, this risky exploration was beneficial because it helped the agents maximize rewards in the long run.Additionally, social information helped them to avoid some severe losses and learn about good options faster.Importantly, these patterns emerged even without needing to invoke specific "motivational" changes in the adolescent agents.That is, the agents exploration bonus for novel options was the same across all stages, as was the utility attributed to options chosen by others.Instead, the changes in behaviour emerged from changes in the environment (e.g.opening up in adolescence) and experience (rewards and losses).These stylized facts generated in an oversimplified world provide valuable insights into how adolescent-specific risk behaviour may emerge, emphasizing a central role for learning and experience.In the following, we discuss how our perspective relates to findings and theoretical frameworks on adolescent risky and social behaviour and their neural development, suggesting novel avenues for future research.

Adolescent biology
Although the agents in our simulations do not go through any developmental changes, it is indisputable that there are major biological changes during adolescence in humans and animals (Luna et al., 2013;Mills et al., 2014;Worthman & Trang, 2018).First and foremost, the start of adolescence is defined by the start of puberty, which is marked by a significant rise in pubertal hormones.Second, neuroimaging studies revealed considerable changes in brain structure and function during the second decade of life.According to dual systems models of neural development (Luna et al., 2013;Nelson et al., 2016;Shulman et al., 2016), risk-taking peaks during adolescence because of a maturational imbalance between an early-maturing dopaminergic reward processing system and a still immature cognitive control system that is not (yet) strong enough to restrain reward-seeking impulses.Given the co-occurrence of puberty and maturational imbalance, it was suggested that hormonal changes related to puberty drive developmental changes in risktaking (Braams et al., 2015).
We do not argue against the empirical findings that form the basis for these theories, however, we argue for a broader perspective, stressing that the interaction with the environment leads to experience, and experience itself leads to changes in brain and behaviour (Romer et al., 2017;Sercombe, 2014).For instance, dopamine and pubertal hormones are also known to enhance learning.A body of work shows that pubertal hormones play a pivotal role in regulating the mechanisms of experience-dependent neuronal plasticity during adolescence (for review, see Laube et al., 2020).In addition, changes in dopamine function during adolescence may play a key role in experience-based fine-tuning of neural systems (Murty et al., 2016).Early on it was pointed out that developmental changes in grey matter most likely reflect experience-based pruning of cortical networks (Giedd et al., 1999), but environmental changes can also cause changes in the mesolimbic dopamine system.For instance, the mesolimbic dopamine system's in vivo activity is enhanced by moving rats to an enriched environment (Segovia et al., 2010).A recent study showed that both striatal dopamine release and dopamine synthesis capacity are significantly elevated in immigrants compared to non-immigrants (Egerton et al., 2017).Thus, it is conceivable that the enhanced level of striatal dopamine, or other neural changes, associated with adolescence is also a response to being confronted with a novel and stressful social environment, rather than just a biological timer going off.In sum, a broader Fig. 4. A) Exploration (y axis) by timepoints in the simulations (x-axis) and whether agents had access to social information (green) or not (yellow).Each shape denotes the average number of switches over the past 50 decisions per developmental stage and depending on whether agents had access to social information.Small transparent shapes denote individual simulations, and large shapes cover the mean and 95 %ci of the mean of all simulations.Explorative decisions are defined as the decision to switch from one to another option.While both, childhood and adolescence can be characterized by relatively high exploration, the adolescent environment leads agents to explore their environment for a prolonged time.B) Decisions to explore an option that others had sampled before (y-axis), when the agent explored independently before.Both, exploration and social following increase in "adolescence" when there are novel options to explore.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)Fig. 5. Outcomes experienced for non-social (yellow) and social (green) agents.Outcomes that are either negative (Loss) or positive (Gain) by age group for each of the 100 simulations (dots).The first row depicts the cumulative count of A) losses and C) gains.The second row shows the magnitude of a given outcome for B) losses and D) gains.As agents progress through the developmental stages, they encounter fewer losses and more gains.Notably, during adolescence, the social following rule induces a greater variability in outcomes that agents received overall on average, compared to simulations that did not include social information.This was true for negative (SD = 19.69vs SD = 16.83) but also positive (SD = 32.43 vs SD = 29.28)outcomes.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

S. Ciranka and W. van den Bos
perspective on adolescent risk may also bring some nuance to the interpretation of current neuroimaging findings.

Understanding risk
We argue that to understand adolescent risk-taking, there is a need to conceive risk-taking not only as impulsive or flawed behaviour but also as an exploratory activity that resolves uncertainty and is necessary to achieve developmental milestones, generates wisdom and knowledge (Rivers et al., 2008) and is often planned (Romer et al., 2017).Refocusing on experience and learning has consequences for studying risk-taking in the laboratory.That is, instead of static forced-choice decision experiments, paradigms involving uncertainty or necessitating exploration might prove more valuable for understanding laboratory correlates of real-life risk taking (Frey et al., 2017;Rosenbaum et al., 2018).Indeed, experimental research studying risk-taking under uncertainty (Blankenstein et al., 2016;Braams et al., 2015;van den Bos & Hertwig, 2017), or exploration (Somerville et al., 2017) elicit behaviour that is predictive of real-life risk taking.The advantage of experimental studies is that they diminish the role of developmental differences in prior experience, knowledge, or exposure to risky situations by confronting everybody with a novel environment (Defoe et al., 2019).
Throughout this manuscript, we used Bayesian reinforcement learning to quantify intuitions arising from focusing on exploration across development (Frankenhuis & Barto, 2019), but the simulated environment also allows for a concrete implementation as experiment (Schulz et al., 2019;Wu et al., 2018).If children, adolescents and adults would be confronted with our environment in an experiment, this experiment would be sensitive to developmental differences in exploration not induced by their ecology but by differences in internal drives.Indeed, evidence from self-report and experimental studies shows that novelty and sensation seeking is at its peak during adolescence (Crone et al., 2008;Maslowsky et al., 2019;Wills et al., 1994).An increase in novelty seeking translates to more optimism in our agents and would lead to increased exploration.Sensation seeking, other than novelty seeking, involves a preference for activities that have high variance in expected value (Zuckerman et al., 1978).Translating this to our simulations and models, a sensation seeking agent would seek out uncertainty and prefer options in the top quadrants of Fig. 2, containing highvariance options.Individuals with a propensity for sensation seeking will be driven to explore more, given that all unknown options are associated with high variance, and finally converge somewhere on the top right, generally experiencing positive outcomes, but also some infrequent very negative ones.If sensation seeking would decline with age, the agents would move to options with lower variance.Although, we here illustrated that an adolescent specific increase in novelty (Cloninger, 1986) or sensation seeking (Romer et al., 2017) is not a necessary prerequisite to explain an adolescent rise and fall in risky behaviour when considering their ecology, we believe a more complete model, for instance The Developmental Neuro-Ecological Risk-taking Model (Defoe, this issue) incorporates both, psychological as well as ecological factors.

Peer influence and social learning
In our simulations, there was a peak of social information use at the beginning of "adolescence" when novel opportunities raised for our agents, suggesting social sensitivity is related to exploration and uncertainty.In the real world, adolescence is a period of major social upheaval.During this period, adolescents become preoccupied with how their peers view them (Somerville et al., 2013) and how they fit into their social groups (Coleman et al., 1977).One might argue that the social context becomes an adolescent's main source of uncertainty.Learning how this social world works, who they are, and where they fit in, are major developmental tasks for adolescents (Nelson et al., 2016).The mere presence of peers, is arousing to adolescents, which may shift the neural balance between reward and control such that it leads to an increase in (impulsive) risk-taking (Chein et al., 2011;Gardner & Steinberg, 2005;Shulman et al., 2016).Others have emphasized that some risk-taking behaviour might aim at reaching social goals, such as status and belonging (Blakemore & Mills, 2014;Blakemore, 2018;Telzer et al., 2018).Here, we highlight another aspect: following or copying others' behaviour, can be a smart form of social learning (Bandura, 1962).Research in adult social learning has shown that social information use often follows a basic principle which is that people use more social information when they are more uncertain or feel less confident (Ciranka & van den Bos, 2020;De Martino et al., 2017;Molleman et al., 2014;Moutoussis et al., 2016;Toelch & Dolan, 2015;Tump et al., 2020).In a novel environment, using social information is beneficial because it informs individuals about good options without the potential dangers of trial and error learning 2 (Hoppitt & Laland, 2013;Mehlhorn et al., 2015;Todd & Brighton, 2016).In line with that, we show that agents who transition into the adolescent environment, in which they are maximally uncertain, use social information most.In addition, we show that combining their knowledge, social agents converge quicker on better options compared to searching for these alone.There are still some severe losses, and sometimes the agents followed a bad example, but overall "adolescent agents" benefited from following their peers.Since social learning allows individuals to avoid experiencing bad outcomes, conformity may be particularly strong for avoidance learning.Several studies have shown that adolescents may also be specifically sensitive to social information promoting risk-avoiding behaviour (Braams et al., 2019;Chung et al., 2020;Ciranka & van den Bos, 2019;Engelmann et al., 2012), which could reflect an adaptation to their uncertain ecology.
The social learning perspective also raises questions.If adolescent risk-taking can also be characterized by social learning and depends on uncertainty, identifying adolescents' uncertainties will help to understand where they are most likely to give in to peer pressure for two reasons.First, situations in which adolescents are uncertain about whether some behaviour is "worth the risk" will be those situations where they are most susceptible to peer influence.Second, uncertainty about others can influence adolescent risk-2 Social information is not always good and conformity can also lead to bad outcomes.We also find in our simulations that if there is too much conformity, this can lead to suboptimal outcomes.

S. Ciranka and W. van den Bos
taking when social learning is not possible.This is because uncertainty itself is related to acute stress responses and arousal in humans when they anticipate negative outcomes (de Berker et al., 2016) or take risks (FeldmanHall et al., 2016).When adolescents find themselves observed by others, they more often anticipate negative outcomes, like being rejected or embarrassed than children or adults (Crone & Konijn, 2018;Pickett et al., 2004;Rodman et al., 2017;Somerville et al., 2013).Thus, their uncertainty about others' mental states in combination with their bias towards predicting negative social outcomes may contribute to adolescents' arousal which is, in turn, thought to nudge them into reward-sensitivity and risk-taking in social contexts (Shulman et al., 2016).

Limitations and extensions
Exploratory behaviour in our simulations not only reduces because the agents are learning but also because there is a finite set of options.In the real world, however, adults take on multiple roles that provide different opportunities and risks (Willoughby et al., 2013).Our choice not to consider different risky options for adults mimics the kinds of behaviours research on adolescent risk-taking is usually concerned with.The risks that adolescents engage in are hardly available for children and are novel for adolescents (2019;Baumrind, 1987;Defoe et al., 2015), but all are have long been available to adults.For instance, the selected items on adolescent risktaking questionnaires, mainly include activities related to substance abuse, risk driving, and sex (Gullone et al., 2000).These differences in risk-exposure and novelty can explain why adolescents exceptional risk-taking in the real-world (Steinberg et al., 2018) fails to replicate in laboratory tasks, where the "environment" is the same for everyone (2019; Defoe et al., 2015).That is, the metaanalyses by Defoe et al. (2015) suggest that children take equal risk compared to adolescents in laboratory tasks.This is in line with our model that predicts that as soon as the environment changes and novel risky options become available, risk-taking will increase.Related, different risks might be easier to explore at different times during development.
It is an empirical question whether options that become available to adults have the same potential for harm on the individual level, but clearly novel opportunities will arise.Simulations could integrate an ever-increasing set of options, by adding them later, resulting in another increase in exploratory behaviour, given that our agents' priors were the same for every new option.Thus, our model does not predict that adults would not take risks or explore anymore, but assumes risk-taking to be determined by the interaction with agents and their environment.By these means, the model can explain why risk-taking in certain areas significantly reduces across adolescence, presumably based on experience, while other risks for instance white collar crimes, which may be much more harmful to society than the risks adolescents take, have a much later peak (Benson & Kent, 2001, Willoughby et al., this issue).Furthermore, our model predicts that significant changes in adults' ecology stage will result in a new spike in exploration and social following behaviour.Although it is the common trope that adolescents will jump off the bridge if all their friends would do it, there is plenty of evidence of adults showing the same herding behaviour when there is uncertainty, for instance in real estate markets (Babalos et al., 2015) or cryptocurrencies (Coskun et al., 2020).More recently, following the Covid-19 outbreak, we have seen herding in hoarding of toilet paper in several countries around the world (Garbe et al., 2020;Kirk & Rifkin, 2020).Thus, when there is novelty and uncertainty, explorative or risky behaviour and social susceptibility will re-occur in adulthood.Exploring this in further studies based on sound intuitions about what type of affordances the adult environment provides will be most insightful.
Further, although agents can suffer losses in simulations, they could never get hurt or even die.Introducing this possibility would generate evolutionary dynamics, such as loss aversion, and would call for a using social information strategically, something adolescents are known to do (Chung et al., 2015;Ciranka & van den Bos, 2019).Generally, studies from across the biological and social sciences suggest that people use social information strategically; they are selective as to who they turn to for useful knowledge (Hoppitt & Laland, 2013).Developmental studies have shown that adolescents are more likely to rely on expert advice than adults when taking financial risks (Engelmann et al., 2012).On the other hand, peers might be a more important information source to adolescents when it comes to risk-perception (Knoll et al., 2015).It would be of great interest to study how decisions about when and whom to learn from, operate across adolescence specifically because the literature suggests that adolescents are exceptionally sensitive to social status (Yeager et al., 2018).Finally, the model assumes that all individuals initially have equal opportunities to benefit from the environment.In reality, parents' socioeconomic status influences the risks and opportunities that children are exposed to (Frankenhuis, Panchanathan, & Nettle, 2016;Worthman & Trang, 2018).It will be insightful to quantify how such inequalities impact risky behaviour during the adaptive mind's development.

Summary and conclusion
The ecological approach has a long tradition in developmental (Bronfenbrenner, 1979) and decision science (Simon, 1956).It proposes that cognitive and motivational systems are shaped-by evolution or development-to take advantage of the external environment's structure (Gigerenzer & Gaissmaier, 2011;Todd & Brighton, 2016).Thus, to understand behaviour, it is necessary to understand the environment it occurs in.Here, we highlighted the role of learning and experience in this process.In the past decades, much research has focused on the adolescent mind's inner workings to understand the mechanisms behind adolescent risk-taking.Our agent-based models illustrate that exploration and adaptation to an uncertain environment itself can give rise to typical adolescent patterns in risk behaviour and peer influence without assuming developmental changes in internal drives.Although the claim that risktaking may be adaptive is not new, we point out that models of adolescent risk-taking and peer influence, must integrate elements of learning, experience and the environment that adolescents adapt to.Such models would paint a fairer picture of adolescents, not just as individuals with unfinished brains and raging hormones, but as active learning agents who are exploring a new and uncertain world.
S. Ciranka and W. van den Bos