Distinguishing true from false confessions using physiological patterns of concealed information recognition – A proof of concept study

Wrongful conviction cases indicate that not all confessors are guilty. However, there is currently no validated method to assess the veracity of confessions. In this preregistered study, we evaluate whether a new application of the Concealed Information Test (CIT) is a potentially valid method to make a distinction between true and false admissions of guilt. Eighty-three participants completed problem-solving tasks, individually and in pairs. Unbeknownst to the participants, their team-member was a confederate, tempting the participant to break the experimental rules by assisting during an individual assignment. Irrespective of actual rule-breaking behavior, all participants were accused of cheating and interrogated. True confessors but not false confessors showed recognition of answers obtained by cheating in the individual task, as evidenced by larger physiological responses to the correct than to plausible but incorrect answers. These ﬁ ndings encourage further investigation on the use of memory detection to discriminate true from false confessions.

In Bruton v. United States, 1968, the United States supreme court declared that a defendants' confession "is probably the most probative and damaging evidence that can be admitted against him" (p.8).Indeed, people find it difficult to believe that anyone would confess to a crime they did not commit (Kassin & Wrightsman, 1980, 1981).A confession can thereby have an overwhelming impact on jurors and judges.Research has indicated that even when no further evidence linking a suspect to a crime was presented, the mere presence of a confession tripled the chance to be found guilty at trial rather than to be acquitted (Leo & Ofshe, 1998).This may explain why false confessions contribute to almost 29 % of the cases investigated by the Innocence Project (www.innocenceproject.org;see also Drizin & Leo, 2004;Garrett, 2011).The lasting detrimental effects of wrongful convictions on all involved parties highlight the need for independent measures to verify the veracity of a confession.
Given the profound influence of confession evidence on trial outcomes, all confessions should be reviewed with caution to determine their veracity.To do so, it is important to look at personal characteristics of the suspect (e.g., dispositional risk factors such as youth or cognitive impairment; Drizin & Leo, 2004;Gudjonsson, 2003) and situational factors (e.g., confrontational interrogation tactics such as the presentation of false evidence and minimization; Kassin, 2015) that might increase the probability that an innocent suspect signs a confession.Yet, even when mock-jurors read transcripts of a murder trial in which the defendants' confession was highly coerced and ruled as inadmissible by the judge, they still did not fully discount the confession when reaching a verdict (Kassin & Sukel, 1997).This pattern is found even when the confession was reported secondhand by an informant who was motivated to implicate the defendant (Neuschatz, Lawson, Swanner, Meissner, & Neuschatz, 2008;Wetmore, Neuschatz, & Gronlund, 2014).Knowledge of possible risk factors might therefore not be sufficient for laypeople to discriminate true from false confessions.
However, even for trained investigators, district attorneys, and judges it is challenging to distinguish true from false confessions.In a study by Kassin, Meissner, and Norwick (2005), prison inmates provided self-incriminating statements that were either true or false.The recorded statements were observed by police investigators who judged the veracity of the confessions.Although they were highly confident in their decision, the overall accuracy rate (53.9 %) was not significantly different from a random guess.Likewise, in a replication study specifically targeting confessions made by juvenile delinquents, the percentage of correct judgements remained at chance level (52.8 %; Honts, Kassin, & Craig, 2014; see also Honts, Forrest, & Stepanescu, 2019).Indeed, false confessions seem indistinguishable from true confessions both in terms of the details provided (Garrett, 2010) and other content cues (e.g., expressions of remorse; Appleby, Hasel, & Kassin, 2013).
While false confessions represent a minority in all criminal cases handled by the courts, the consequences can be detrimental when they are not correctly recognized as such.False admissions of guilt have been shown to taint eyewitness identifications (Hasel & Kassin, 2009), statements from alibi witnesses (Marion, Kukucka, Collins, Kassin, & Burke, 2016), forensic experts (Kassin, Dror, & Kukucka, 2013), and contribute to tunnel vision amongst investigators.Moreover, if the main focus of the criminal investigation locks in on an innocent suspect, the true perpetrator escapes capture, remaining free to commit other crimes (e.g., the Central Park jogger case; Burns, McMahon, & Burns, 2012).Meanwhile, wrongful imprisonment can stigmatize the innocent (Clow & Leach, 2015) and trigger severe mental health issues (Grounds, 2005;Scott, 2010).
In the current study, the applicability of the Concealed Information Test (CIT; Lykken, 1959;Verschuere et al., 2011) to evaluate the veracity of confessions is explored.A key distinction between guilty and innocent suspects lies in the involvement in the crime and physical presence at the scene.The CIT, assessing the recognition of these intimate details derived from the investigation of the criminal act, targeting perpetrators' knowledge, may thereby distinguish true from false admissions of guilt.
The CIT examination involves the presentation of several questions, each followed by one true detail of the crime and several plausible control items, for example 'Where was the victim's body found?a) bathroom, b) kitchen, c) bedroom, d) garden, e) living room'.During the sequential presentation of all items, psychophysiological responses -most commonly skin conductance, heart rate, and respiration-are measured.Innocent examinees, who are unaware of factual information about the location of the victim in the house, are expected to respond similarly to all presented alternatives.However, recognizing salient crime details in the knowledgeable examinee will be apparent from an increase of the skin conductance (i.e., orienting response; Lykken, 1974;Sokolov, 1963), a deceleration of the heart rate and suppression of the respiration cycle (i.e., arousal inhibition; klein Selle, Verschuere, Kindt, Meijer, & Ben-Shakhar, 2016;klein Selle, Verschuere, Kindt, Meijer, & Ben-Shakhar, 2017).
The CIT was introduced by David Lykken, who reasoned that physiological responses should be used not to detect lying in itself, but rather to verify the presence or absence of crime-related details in the memory of the suspect.In the first study (Lykken, 1959), 49 participants were randomly assigned to commit mock-crimes or remain innocent.In the subsequent examination, all participants were asked questions targeting intimate details of the crimes, that would only be familiar to those actually involved.The comparison of skin conductance responsivity upon presentation of critical items to those elicited by control items revealed a high classification rate (nearly 95 %), providing initial evidence for the validity of the CIT.A more recent metaanalysis (Meijer, klein Selle, Elber, & Ben-Shakhar, 2014) supports the validity of multiple physiological measures to correctly detect both presence or absence of intimate crime knowledge, showing large effects for the skin conductance, heart rate and respiration measure (Cohen's d of 1.55, 0.89, and 1.11, respectively).
In the current study, we investigated whether true confessions can be distinguished from false confessions using psychophysiological memory detection.We used a variation of the procedure described by Russano, Meissner, Narchet, and Kassin (2005) to elicit true and false confessions in a laboratory environment.Similar deception paradigms were used by Exline, Thibaut, Hickey, and Gumpert (1970).Participants were paired with a research confederate and faced with various problem-solving tasks of which some had to be solved individually, while others as a pair.During the trivia quiz that had to be solved in an individual manner, the research confederate actively sought help from the participant.Participants who broke the experimental rule by assisting the female confederate with her set of questions and giving her the correct answers, were thus guilty of cheating; those who merely solved their own set of questions were deemed innocent.Independent of actual guilt, all participants were accused of cheating and interrogated by the experimenter in an accusatory manner.Both true and false confessors were examined using the CIT, to verify whether they showed recognition of the answers that they could have solved for the confederate (see Fig. 1).We expected that true confessors would exhibit differential physiological responses to the critical items in comparison to control items, while false confessors would show similar responding to all items, indicating non-recognition of the confederates' answers.

Method
Ethical approval was obtained from the Ethic Review Board of the University of Amsterdam and archived under number 2018-CP-9071.All participants provided written informed consent before taking part in the study, stating that study participation was voluntary and the experiment could be terminated at any time without consequences.This study was preregistered on https://osf.io/pbjt5.Task scripts, data and other materials are publicly available on https://osf.io/9dk5g.

Participants
Participants were recruited through a university portal and received course credits or a monetary compensation.Participants were required to be fluent in Dutch and between the age of 18 and 40.The initial sample (i.e., before exclusion; see below) consisted of 83 individuals (79.5 % female), and participants were 21.86 years old (SD age = 3.73, range from 18 to 38) on average.
In the current paradigm, a confederate requested participants to help with two trivia questions while the pair was instructed to solve them individually.Based upon their performance on the trivia quiz with the confederate, the participants were classified as guilty (i.e., those solving the confederates' individual trivia questions and thereby violating the experimental rule), or innocents (i.e., those who refused to help the confederate and worked independently during the individual tasks), see Fig. 1.Twenty-two participants were excluded due to several reasons: Eighteen participants only knew the answer to one of the two cheating questions; two participants initially agreed to assist the confederate, but later declined (due to knowledge contamination, these participants were also excluded from analyses); and two participants exercised their right to withdraw participation and their experimental session was immediately terminated and their data were excluded from analyses.
The remaining sample consisted of 28 individuals who were guilty of helping the confederate and thereby breaking the experimental rule (M age = 21.68,SD age = 4.26), and 33 innocents (i.e., those who refused to help the confederate and worked independently during the individual tasks, M age = 21.73,SD age = 3.45).

Procedure
Participants were recruited for research on the influence of personality on problem-solving, disguising the true aim of the experiment.The experiment was divided into five distinct parts: personality questionnaires (for exploratory analyses), problem-solving assignments (allowing to engage in rule-breaking behavior), the interrogation (aimed at eliciting a confession), the CIT (aimed at distinguishing true from false confessions) and an elaborate debriefing.

Personality questionnaires
Before participating in the study, participants received the link to an online form to complete the 60-item HEXACO Personality Inventory Revised (60-HEXACO-PI-R; Ashton & Lee, 2009) and the Gudjonsson Compliance Scale (GCS; Gudjonsson, 1989), followed by demographical information on age, gender and field of study.At least seven days from completing the online questionnaires, participants were invited to the experimental session in the lab.Exploratory analyses using these measures are provided in Supplementary Appendix A.

Problem-solving assignments
Upon arrival, participants were met by two female research assistants who alternated the roles of experimenter and confederate posing as another participant.Participants were asked to leave all personal belongings in the adjacent room and wash their hands in preparation of the physiological measurements in a later stage of the study.Then, the pair was escorted into a small testing room with blinded windows and asked to read the information brochure and sign the informed consent.
Generally following the paradigm described by Russano et al. (2005), the experimenter provided each participant with a booklet including three individual problem-solving assignments followed by three group tasks.The experimenter instructed that it was important for the results to comply with the instructions on whether the assignment should be solved individually or in pairs, thereby implying that cooperating on the individual tasks was not allowed.After instructing participants to set a timer with an alarm after five minutes on the computer for each of the six tasks, the experimenter left the room and the participants started completing their own problem-solving booklet.
During the last individual task, the confederate feigned difficulties with answering two trivia quiz questions presented in her booklet.Approximately three minutes into the task, the confederate stated they were probably not allowed to work together, but nevertheless asked whether the participant would be willing to solve two questions.The confederate requested help without actually showing the participants the questions until the participant agreed to provide help.Upon agreement of the participant -thereby breaking the critical experimental rule-the confederate showed her booklet and the critical questions to the participant (i.e., 'Which social media logo is shown here?' and 'Which continent hosted the Winter Olympics this year?'1 ), and asked for the correct answers (i.e., Snapchat and Asia).If the participant declined helping the confederate after a second request, the confederate did not attempt to seek further information from the participant.
After participants finished all problem-solving tasks, or after the final timer, the experimenter returned to the room and collected the booklets in order to review correctness of the answers.In fact, the experimenter did not check the answers at this stage, as to remain blind on whether participants had actually cheated.After roughly two minutes, the experimenter returned to the participants and said that she wanted to talk to both participants separately, starting with the confederate.During that time, the participants waited alone for five minutes in the small experimental room.

Interrogation
The interrogation protocol and physical layout of the room were conducted following the guidelines recommended by Inbau, Reid, Buckley, and Jayne (2013).The protocol was designed by the first author, who completed training and is certified in the Reid technique.Although this coercive technique has been criticized (see Gudjonsson, 2003), we deliberately used the guidelines in an attempt to provoke a large number of false confessions.
While the confederate remained in the adjacent room, participants sat in the small experimental room, equipped with two chairs, a desk and bare walls.The participant was always seated in the far corner of the room, so that the experimenter sat between the participant and the door during the interrogation.The experimenter-blind to actual cheating behavior-entered the room and directly accused the participant of not following instructions by working together on an individual task.The experimenter stated that the cheating behavior posed a problem as the data were intended to be published in a scientific journal.Lastly, the experimenter said that she should probably inform her supervisor about this problem but was reluctant to do so as this would bear unknown consequences.
The experimenter then sat down next to the participant and adopted a more open attitude to offer the deal.She pretended to have come up with a possible solution, in which the participant would make a new appointment to re-do the experiment with a new partner.This way, the data of the current session could be overwritten and the supervisor did not have to be alerted.Participants were informed that they would receive participation credits after the new appointment.While holding this monologue, the experimenter wrote down a confession statement to be signed by the participant who accepted the deal.The statement read 'I -name of participant-confess that I worked together with another participant on a task that had to be solved individually'.During the entire interrogation phase, participants' objection and denials were interrupted.Minimization techniques were also actively deployed to elicit a confession, specifically targeting prosocial behavior and absence of malicious intent (e.g., 'I would also help another person if they would ask me to' and 'I am sure you didn't know that it would influence the results').When these tactics did not result in a signed confession, the experimenter pretended to look for the supervisor and returned to the participant stating that the supervisor was lecturing for 30 more minutes.If the participant expressed the desire to wait for the supervisor (and did not confess), the experimenter aborted the interrogation and continued to the CIT.

CIT
In this part of the study, the experimenter attached the respiration belts as well as the skin conductance and heart rate electrodes and conducted the full CIT procedure (due to hardware failure, the RLL signal could not be analyzed and is not reported).Participants were told that they were suspected of cheating and instructed to prove their innocence in the subsequent polygraph test.To mimic the situation in which an unknowledgeable outcome would be favorable for the suspect, we offered participants a є5 (or an equivalent in participation credits) incentive for successful concealment of their knowledge of the correct answers.The bonus was paid to all participants regardless of their actual CIT results.
Following an initial rest period of two minutes after attachment of the electrodes, participants were presented with the CIT questions and answering alternatives.The alternative answers to the two questions the participants could have cheated on during the trivia quiz were presented on the screen either as an image or as a written word (see Fig. 2 and Supplementary Appendix B for items).Beside the respective presentation on the computer screen, participants heard the spoken version through their headphones.Participants were asked to respond with a verbal 'no' to each alternative.The order of the questions and their answering alternatives was randomly determined.Each alternative answer was either presented as an image or as a written word.A self-paced break was given after the first four questions to maintain participant's attention.The audio files were pre-recorded by a third party who was blind to the procedure.The question remained on the screen for 10 s, followed by the answering alternatives that were presented for 5 s each, with a mean inter-stimulus interval of 18 s (range from 16 to 20).Between item presentation, a fixation-cross appeared on the screen to maintain the attention of the participant.
The first answering alternative following the question was always a buffer item designed to absorb the initial orienting response.Subsequently, the critical item, four control items and a single catch item were presented in a random order.Catch items were included in the CIT to ensure the participants' attention to all presented items.Upon identifying the catch item amongst the presented alternatives, participants were instructed to repeat this specific item verbally.In the current CIT, the catch items consisted of random numbers ranging from 1 to 10, either in written words or as an image.Besides from repeating the catch items, participants were instructed to respond to all other items with a verbal "no".Altogether, participants were presented with two blocks of four questions, each consisting of seven items (i.e., 1 buffer, 1 critical, 4 control, and 1 catch item), totaling 56 items.For a visual representation of the CIT, see Fig. 2.
Recall and recognition memory tests were administered to examine whether participants correctly remembered the critical items from the trivia quiz.First, the two quiz questions for which the confederate asked the participant for help were sequentially presented on the computer screen with a text balloon in which participants could freely enter the answer.All participants were informed of the option to enter 'I don't know'.Answers were coded as either correct (1) or incorrect (0), leading to a total possible score of 2, reflecting perfect recall.Then, both cheating questions were counterbalanced to be depicted twice in a forced-choice format, with five answer options (i.e., the critical and four control items) being presented once in written words and once with images.Participants were asked to select the most appropriate option if they did not know the correct answer.Answers were coded as either correct (1) or incorrect (0), leading to a recognition scoring range from 0−4.Then, participants rated their general experience of being accused of cheating on a 7-point Likert scale ranging from 1 (very negative) to 7 (very positive).Also, they were asked to rate the amount of stress they had experienced during the interrogation, on a 7-point Likert scale ranging from 1 (not at all stressful) to 7 (very stressful).Then, participants rated six questions designed to assess their motivational state on a 5-point Likert scale.This questionnaire measured how well participants were able to focus on the screen during the CIT, how involved they were in the study and how much they tried to avoid detection and appear innocent on the CIT.Finally, participants rated their effort to suppress or raise their physiological responses during the test and freely elaborated on whether they had used strategies to avoid detection.

Debriefing
All participants were asked what they thought the study was about.From there, the experimenter fully described the procedure and the true nature of the study.The act of misleading the participants to reach the goal of the study was explicitly mentioned and explained.Participants were told that the set-up of the study was specifically designed to aid cheating behavior, adding that cheating was an admirable prosocial act in this case.Furthermore, participants were informed that no consequences would follow and that the supervisor was in fact not alerted.Also, the risk and dangers of falsely confessing to an act not committed were explicitly discussed with participants.Participants received an extensive written debriefing, explaining the relevance of the study and how minimizing interrogation techniques may lead to wrongful convictions.At last, participants were asked not to discuss the real aim or procedure of the study with others.

Data acquisition and reduction
The experiment was conducted in an air-conditioned laboratory.Item presentation was performed using Presentation® software (Version 18.0, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com).Psychophysiological responses were measured and recorded with Vsrrp89 software, developed by the Technical Support Social and Behavioral Sciences at the University of Amsterdam.
Electrodermal activity was recorded with an amplifier using a 50 Hz, sine-shaped excitation voltage with an amplitude of 1Vpp.Two curved-shape sintered silver-silver chloride (Ag/AgCl) electrodes (20 × 16 mm) were connected to the palmar surface of the distal phalanges of the left index and left ring finger with adhesive tape.The Skin Conductance Response (SCR) was measured from 1 s to 5 s after item onset and defined as the maximal increase in conductance during this time window.
The ECG measure was acquired by placing a set of three Ag/AgCl electrodes (3M™ Red Dot™ disposables, type 2249-50) in a standard Einthoven lead-II configuration: one electrode attached near the distal end of the left collarbone, one electrode placed near the distal end of the right collarbone, and one electrode placed on the left lateral base of the chest.Prior to analysis, the inter-beat intervals were converted to Heart Rate (HR) in beats per minute per real-time epoch (1 s).The 15 second-by-second post-item HR values were baseline-corrected by subtracting the average pre-item baseline HR value (mean HR in the three seconds preceding item onset), resulting in 15 post-item difference scores (ΔHR).The average of these 15 scores was used as the HR deceleration dependent measure.
To eliminate individual differences in raw response patterns, we used within-subject standardized scores for each physiological measure.For each participant we computed the response to the critical items relative to the mean and the standard deviation of the total response distribution within each block of two questions (i.e., before and after the self-paced break), as described in Ben-Shakhar and Elaad (2002).Buffer and catch items were not included in the standardization procedure (see klein Selle et al., 2016Selle et al., , 2017)).As the dependent measure, CIT-detection scores were calculated for each participant and each physiological measure by averaging the standardized score of all critical items.Since recognition of the critical item results in heart rate deceleration, HR CIT-scores are multiplied by -1 prior to analysis.Therefore, for both SCR and HR, a positive CIT-score is indicative of enhanced physiological responding upon the presentation of the critical item (i.e., concealed information recognition).

Exclusion criteria
On participant level, physiological data were eliminated from analyses when anomalies occurred (for HR, n = 1).For exclusions on trial level, for each of the dependent measures, item-specific responses were removed if the standardized score was smaller than -5 or larger than 5, reflecting When a movement coincided with a positive standardized score (for SCR) or a positive standardized score larger than 2 or lower than -2 (for HR), the item was discarded from analyses (see also Geven, klein Selle, Ben-Shakhar, Kindt, & Verschuere, 2018).
Further exclusions on response level were performed when participants showed a standard deviation of the raw SCR scores below 0.01 during the presentation of a block (i.e., 4 questions).In these cases, all SCR measurements from that block were discarded from further analyses due to non-responsiveness.Following these exclusion criteria, 96 % (range from 50 %-100 %) of the SCR and 97 % (range from 95 %-100 %) of the HR data were included in the analyses.

Results
All analyses used an alpha level of 0.05.Effect sizes for the independent samples t-tests are reported using Cohen's d.In addition, JZS Bayes factors (BF) were computed using JASP software version 0.8.4,representing numerical values quantifying the odds ratio between the null and the alternative hypothesis given the data.BF 01 annotates how much more likely the data are under the null as compared to the alternative hypothesis, and BF 10 annotates how much more likely the data are under the alternative as compared to the null hypothesis.For one-tailed testing, Bayes factors are reported as either predicting the null (BF 0+ ) or the alternative hypothesis (BF +0 ).JZS prior with scaling factor r = 1.000 was used for the alternative hypothesis (see Rouder, Speckman, Sun, Morey, & Iverson, 2009).It should be noted that values close to 1 fail to support either hypotheses, and that Bayes factors represent a relative comparison of one hypothesis over the other but that both could be false.

True versus false confessions
It was expected that true confessors, but not false confessors, display physiological evidence of recognizing intimate details of the transgression in the CIT.We quantified the mean detection score such that positive CIT-scores provide physiological evidence of recognition.Therefore, it was expected that for both physiological measures the CITscore to the critical items will be significantly higher for true confessors compared to false confessors.This was analyzed with a one-tailed Note.True confessors, but not false confessors, were hypothesized to show physiological responses indicating recognition of the critical item (e.g., Asia) in comparison to control items (e.g., South-America, North-America, Africa, and Antarctica).
independent-samples t-test with Condition as the grouping variable and the CIT-score as the dependent measure.Fig. 3 shows the SCR and HR CIT-scores for false confessors and true confessors.
For the SCR, a significantly higher CIT-score was revealed for true confessors (M = 0.75, SD = 0.43) than for false confessors (M = -0.01,SD = 0.36), t( 46 Within each of the two conditions, we investigated whether cipants exhibited significantly larger responses to the critical item compared to the control items (i.e., CIT-score > 0).For this purpose, a one-tailed one-sample t-test was conducted for true confessors and a two-tailed one-sample t-test for false confessors.We expected that a CIT-score significantly larger than zero will be observed for true confessors, reflecting recognition, whereas the response to the critical items should not differ from the control items amongst false confessors.

Individual detection accuracy
To analyze the detection efficiency of classifying participants as true or false confessors, we compared the distribution of the CIT-scores for both groups.Receiver operation characteristics (ROC) curves were constructed for each physiological measure separately, depicting the true positive rate versus the false positive rate for every possible classification threshold (see Supplementary Appendix C).As an outcome measure, we computed the area under the curve (a) using the mean CIT-score as the dependent variable and Condition as the state variable.This way, the accuracy of the CIT to classify true confessors and false confessors was calculated.The value of the area under the curve varies between 0 and 1, with a value of 0.5 representing classification at chance level and a value of 1 reflecting perfect accuracy in separating true from false confessors based on their physiological responding.For both physiological measures, the ROC is expected to be significantly above chance level, inferred when the lower boundary of the 95 % confidence interval is higher than chance (i.e., a value of 0.50).
Analyses revealed that detection efficiency was significantly larger than chance classification, with a = 0.90 [0.81;0.99]for SCR and a = 0.77 [0.63;0.90]for HR.

Memory
All but one of the true confessors recalled both correct answers of the quiz (M = 1.96,SD = 0.19); none of the false confessors was able to do so (M = 0.00, SD = 0.00).No statistical analyses could therefore be conducted for recall.Among true confessors, 100 % correctly chose all trivia answers in a recognition task.In contrast, only four out of 20 false confessors choose the correct answer 'Snapchat' and three out of 20 false confessors correctly picked 'Asia'.A one-tailed independent-samples t-test revealed that true confessors (M = 3.96, SD = 0.19) had a significantly higher recognition rate than false confessors (M = 0.65, SD = 1.14), t(19.75)= 12.91, p < 0.001, d = 4.08, 95 % CI [2.66;∞], BF +0 = 3.56e+16.

Motivation
Two-tailed independent-samples t-tests revealed that true confessors reported significantly more motivation to hide knowledge of the correct answers compared to false confessors.Additionally, true confessors reported significantly more effort to suppress and enhance physiological reactions during the test compared to false confessors.No significant differences were found between true and false confessors in the reported focus on the computer screen during the CIT, general involvement in the study, or memory for the answers of the trivia quiz.Table 1 shows the descriptive statistics.

Stress
Two-tailed independent-samples t-tests revealed that false confessors reported a significantly more negative experience than true confessors.Additionally, no significant differences were found between true and false confessors in the reported stress experienced during the interrogation.Two-tailed independent-samples t-tests revealed no significant differences between false confessors and true deniers regarding their experience and stress during the interrogation.Table 2 shows the descriptive statistics.

Discussion
In the current study, we examined the potential of the CIT to evaluate the veracity of confessions.The CIT uses physiological measures to verify whether the suspect recognizes critical details derived from the crime scene investigation, that would be known to the perpetrator.As a consequence, innocent suspects should not show physiological signs of recognition.The CIT produced an effect of d = 1.88 for the oldest and most often used measure in CIT research (SCR).This means that true confessors, but not false confessors, displayed physiological changes that are associated with intimate knowledge of the experimental transgression.The results indicate that, in the current study, there is a 91 % chance that a true confessor will have a higher SCR score than a person randomly picked from the false confessors.
The success of this laboratory study raises the question whether the CIT can and should be used in the forensic field to falsify confessions.A key challenge for the application of the CIT in real life investigative interviews is the fact that false confessions often contain information that is presumed to represent the perpetrator's knowledge of the crime (Garrett, 2010(Garrett, , 2015)).Yet in a sample of 66 false confessions in DNA exoneration cases in the United States, the vast majority of false confessions contain details about the crime that are both accurate and not in the public domain (Garrett, 2015).In these instances, the presence of these details indicates that police had contaminated the process of interrogation by communicating information, providing secondhand guilty knowledge to otherwise innocent suspects (for an example, see Trainum, 2014).
It is critical to remember that the CIT assesses recognition of crime scene information, not guilt (Ogawa, Matsuda, & Verschuere, 2015).Results of several studies have revealed that the mere presence of knowledge suffices to elicit a CIT effect, irrespective of the source of the memory (e.g., Bradley, Barefoot, & Arsenault, 2011).In a mock-crime study, Bradley and Rettinger (1992) split participants into three groups in which the origin of critical information was manipulated.Participants who had committed a mock-crime could be clearly separated from innocents.Yet, those who were informed of the critical details without actually committing the act showed a larger CIT effect in comparison to innocents.While informed innocents could still be distinguished from guilty participants on a group level, individual classification on a case-to-case basis remains challenging.If no procedures are in place to prevent or observe the process of contamination in the interrogation room (see Alceste et al., 2020), such as recording police interrogations to document the original source of each detail appearing in the confession, innocents might be classified as knowledgeable in the CIT.
Under what conditions, then, may the CIT be still of use to evaluate the veracity of confessions?Only when investigators withhold detailed crime scene information during an investigation so that the suspect's recognition of intimate details as revealed by the CIT would have high evidential value (Osugi, 2011).In fact, the same conditions are necessary for applications of the CIT in forensic investigations.In case of partial contamination, there may remain non-contaminated details, or the CIT could consist of more detailed items (see also Osugi, 2018).Consider the case of Brendan Dassey, in which investigators asked leading and highly suggestive questions about what had happened to the victim's head (Ricciardi & Demos, 2015).After several incorrect guesses, the detective reveals to Dassey that the victim was shot.In a subsequent CIT, questions could still be asked about the number of shots, as well as the specific firearm used.Initial findings regarding this matter reveal promising results, showing that probing for more specific exemplar-level details (e.g., revolver) as compared to possibly contaminated broader information (e.g., firearm) might reduce the risk that contamination negatively affects CIT classification (Geven, Ben-Shakhar, Kindt, & Verschuere, 2019).
Moreover, using pictorial items enlarges the flexibility of the CIT (Lykken, 1998) and has been previously found suitable to detect concealed information (Ambach, Bursch, Stark, & Vaitl, 2010;Seymour & Kerlin, 2008).The presentation of pictorial items on an exemplar-level (e.g., images of various revolvers) may therefore provide a safeguard for innocents contaminated with categorical verbal information (e.g., that the suspect was shot).However, critical items may be less discernible from control items at an exemplar level than at a categorical level (Osugi, 2018) and further research has to specifically validate the use of visual exemplar-level details.

Limitations and future directions
This study is not without limitations.First, in the current paradigm, the transgression consisted of participants helping a confederate on a task that should have been solved individually.Questions can be raised about the generalizability of a CIT regarding a minor, provoked transgression compared to high-stake serious offenses.Moreover, it should be noted that the transgression in the current paradigm was a prosocial act, not an antisocial act, as it was the confederate who initiated the cheating.
Second, the cover story was mainly held up for the first three parts of the experiment; the personality questionnaires, problem-solving tasks, and -most importantly-the accusation and subsequent confession.For the final part of the experiment, all participants were informed that a deception detection test would be used to indicate whether they had cheated during the trivia quiz.Therefore, this information did not influence the decision to confess but was designed to ensure all  participants responded with a verbal 'no' upon item presentation in the CIT.As guilty suspects are usually intrinsically motivated to conceal critical information, participants were explicitly instructed and incentivized to conceal their possible knowledge.For future research, participants could be instructed to verbally repeat all alternatives as this ensures active item processing without explicit deception.Third, in the original paradigm by Russano et al. (2005) the confederate either actively sought assistance from participants, or not, resulting in a random assignment of guilt.In contrast, all participants in the current study were tempted by the confederate, thereby leading to self-selected rule-breaking behavior.Whereas random assignment to condition is an important feature to exclude confounding variables in research, this also leads to loss of ecological validity.Especially in the field of legal psychology, deception and rule-breaking behavior is typically self-selected.Crucially, self-initiated cheating does not undermine detection efficiency in the CIT compared to instructed cheating (Geven et al., 2018).
Fourth, while we tested a student sample and investigated situational factors used to obtain confessions, we did not consider the impact of dispositional characteristics that might render some suspects particularly vulnerable (e.g., intellectual disability; Appelbaum & Appelbaum, 1994; see also Clare & Gudjonsson, 1995;Gross, Jacoby, Matheson, Montgomery, & Patil, 2005;Kassin, Perillo, Appleby, & Kukucka, 2015).In the sample of 66 DNA exoneration cases investigated by Garrett (2015), more than a third of the confessions were obtained from juveniles; 22 exonerees had a diagnosed intellectual disability or mental illness.Previous research suggests that the CIT can be reliably used with children and juveniles (Visu-Petra, Jurje, Ciornei, & Visu-Petra, 2016) as well as in the general prison sample (Suchotzki, Kakavand, & Gamer, 2018).Besides improving special procedures for interviewing vulnerable suspects, the validity of the CIT with vulnerable suspects (e.g., cognitive impairments) should be further investigated.

Conclusions
An admission of guilt is considered the most probative evidence at trial that by itself increases the chance of a conviction (Leo & Ofshe, 1998).In light of the number of wrongful convictions in which false confessions were a contributing factor, it is of utmost importance to verify whether the confessor is factually guilty of the crime or gave a false incriminating statement.However, it can be a challenging task to reliably differentiate between true and false confessions in the criminal justice system.The current findings reveal initial evidence that the CIT may be able to distinguish true from false confessions.While further research is needed before the CIT could be implemented in law enforcement settings, the present findings demonstrate that confessing cheaters could be meaningfully distinguished from innocent false confessors by verifying the presence of crime-related knowledge is a promising starting point.Important restrictions with regards to the contamination of suspects in the field must be considered, but we point to possible future usage of the CIT under specific circumstances.

Fig. 1 .
Fig. 1.Flow of the experimental procedure creating true and false confessions.Note.The comparison of interest concerns the CIT-scores of the true and false confessors.

Fig. 2 .
Fig. 2. Visual representation of the course of CIT questions.Note.True confessors, but not false confessors, were hypothesized to show physiological responses indicating recognition of the critical item (e.g., Asia) in comparison to control items (e.g., South-America, North-America, Africa, and Antarctica).

Fig. 3 .
Fig. 3. Mean CIT-scores for true and false confessors.Note.Error bars represent standard errors.

Table 1
Descriptive statistics of the motivation questionnaire items (5-point Likert scale).

Table 2
Descriptive statistics of the stress questionnaire items (7-point Likert scale).