This document, for over two years, was hosted on the University of Oregon web server. In Autumn of 1998 it was no longer available through that source. This paper is part of a group of papers, all related to the same very public and very controversial report. It would be a form of bias to make the other papers available when this one is not. Since this document is no longer available via link from Dr. Hyman's university, we are providing a locally-hosted copy for review. -- Webmaster

Evaluation of Program on Anomalous Mental Phenomena
Ray Hyman
University of Oregon
Eugene, Oregon

September 11, 1995

INTRODUCTION

Professor Jessica Utts and I were given the task of evaluating the program on "Anomalous Mental Phenomena" carried outat SRI International (formerly the Stanford Research Institute)from 1973 through 1989 and continued at SAIC (ScienceApplications International Corporation) from 1992 through 1994.We were asked to evaluate this research in terms of itsscientific value. We were also asked to comment on its potentialutility for intelligence applications.

The investigators use the term Anomalous Mental Phenomena torefer to what the parapsychologists label as psi. Psi includesboth extrasensory perception (called Anomalous Cognitionby the present investigators) and psychokinesis (called AnomalousPerturbation by the present investigators). The experimentersclaim that their results support the existence of AnomalousCognition--especially clairvoyance (information transmission froma target without the intervention of a human sender) andprecognition. They found no evidence for the existence ofAnomalous Perturbation.

Our evaluation will focus on the 10 experiments conducted atSAIC. These are the most recent in the program as well as theonly ones for which we have adequate documentation. The earlierSRI research on remote viewing suffered from methodologicalinadequacies. Another reason for concentrating upon this morerecent set of experiments is the limited time frame allotted forthis evaluation.

I will not ignore entirely the earlier SRI research. I willalso consider some of the contemporary research in parapsychologyat other laboratories. This is because a proper scientificevaluation of any research program has to place it in the contextof the broader scientific community. In addition, some of thiscontemporary research was subcontracted by the SAICinvestigators.

Professor Utts has provided an historical overview of the SRI and SAIC programs as well as descriptions of the experimentsunder consideration. I will not duplicate what she has written onthese topics. Instead, I will focus on her conclusions that:

Using the standards applied to any other area of science,it is concluded that psychic functioning has been wellestablished. [Utts, Sept. 1995, p 1]

Arguments that these results could be due to methodologicalflaws in the experiments are soundly refuted. Effects of similarmagnitude to those found in government-sponsored research at SRIand SAIC have been replicated at a number of laboratories acrossthe world. Such consistency cannot be readily explained by claimsof flaws or fraud. [Utts, Sept. 1995, p 1]

Because my report will emphasize points of disagreementbetween Professor Utts and me, I want to state that we agree onmany other points. We both agree that the SAIC experiments werefree of the methodological weaknesses that plagued the early SRIresearch. We also agree that the SAIC experiments appear to befree of the more obvious and better known flaws that caninvalidate the results of parapsychological investigations. Weagree that the effect sizes reported in the SAIC experiments aretoo large and consistent to be dismissed as statistical flukes.

I also believe that Jessica Utts and I agree on what the nextsteps should be.

We disagree on key questions such as:

1. Do these apparently non-chance effects justify concludingthat the existence of anomalous cognition has been established?

2. Has the possibility of methodological flaws been completelyeliminated?

3. Are the SAIC results consistent with the contemporaryfindings in other parapsychological laboratories on remoteviewing and the ganzfeld phenomenon?

The remainder of this report will try to justify why I believethe answer to these three questions is "no."

SCIENTIFIC STATUS OF THE PROGRAM

Science is basically a communal activity. For anydeveloped field of inquiry, a community of experts exist. Thiscommunity provides the disciplinary matrix whichdetermines what questions are worth asking, which issues arerelevant, what variables matter and which can be safely ignored,and the criteria for judging the adequacy of observational data.The community provides checks and balances through the refereesystem, open criticism, and independent replications. Only thoserelationships that are reasonably lawful and replicable acrossindependent laboratories become part of the shared scientificstore of "knowledge."

An individual investigator or laboratory can contribute tothis store. However, by itself, the output of a singleinvestigator or laboratory does not constitute science. No matterhow careful and competent the research, the findings of a singlelaboratory count for nothing unless they can be reliablyreplicated in other laboratories. This rule is true of ordinaryclaims. It holds true especially for claims that add somethingnew or novel to the existing database. When an investigator, forexample, announces the discovery of a new element, the claim isnot accepted until the finding has been successfully replicatedby several independent laboratories. Of course, this rule isenforced even more when the claim has revolutionary implicationsthat challenge the fundamental principles underlying mostsciences.

GENERAL SCIENTIFIC HANDICAPS OF THE SAIC PROGRAM

The brief characterization of scientific inquiry in thepreceding section alerts us to serious problems in trying toassess the scientific status of the SAIC research. The secrecyunder which the SRI and SAIC programs was conducted necessarilycut them off from the communal aspects of scientific inquiry. Thechecks and balances that come from being an open part of thedisciplinary matrix were absent. With the exception of the pastyear or so, none of the reports went through the all-importantpeer-review system. Worse, promising findings did not have theopportunity of being replicated in other laboratories.

The commendable improvements in protocols, methodology, anddata-gathering have not profited from the general shake-down anddebugging that comes mainly from other laboratories trying to usethe same improvements. Although the research program that startedin 1973 continued for over twenty years, the secrecy and otherconstraints have produced only ten adequate experiments forconsideration. Unfortunately, ten experiments--especially fromone laboratory (considering the SAIC program as a continuation ofthe SRI program)--is far too few to establish reliablerelationships in almost any area of inquiry. In the traditionallyelusive quest for psi, ten experiments from one laboratorypromise very little in the way of useful conclusions.

The ten SAIC experiments suffer another handicap in theirquest for scientific status. The principal investigator was notfree to run the program to maximize scientific payoff. Instead,he had to do experiments and add variables to suit the desires ofhis sponsors. The result was an attempt to explore too manyquestions with too few resources. In other words, the scientificinquiry was spread too thin. The 10 experiments were asked toprovide too many sorts of information.

For these reasons, even before we get to the details (andremember the devil is usually in the details), the scientificcontribution of this set of studies will necessarily be limited.

PARAPSYCHOLOGY'S STATUS AS A SCIENCE

Parapsychology began its quest for scientific status in themid-1800s. At that time it was known as psychical research. TheSociety for Psychical Research was founded in London in 1882.Since that time, many investigators--including at least fourNobel laureates--have tried to establish parapsychology as alegitimate science. Beginning in the early 1930s, J.B. Rhineinitiated an impressive program to distance parapsychology fromits tainted beginnings in spiritualistic seances and turn it intoan experimental science. He pulled together various ideas of hispredecessors in an attempt to make the study of ESP and PK arigorous discipline based on careful controls and statisticalanalysis.

His first major publication caught the attention of thescientific community. Many were impressed with this display of ahuge database, gathered under controlled conditions, and analyzedwith the most modern statistical tools. Critics quickly attackedthe statistical basis of the research. However, Burton Camp, thepresident of the Institute of Mathematical Statistics, came tothe parapsychologists' defense in 1937. He issued a statementthat if the critics were going to fault parapsychologicalresearch they could not do so on statistical grounds. The criticsthen turned their attention to methodological weaknesses. Herethey had more success.

What really turned scientists against parapsychologicalclaims, however, was the fact that several scientists failed toreplicate Rhine's results. This problem of replicability hasplagued parapsychology ever since. The few, but well-publicized,cheating scandals that were uncovered also worked againstparapsychology's acceptance into the general scientificcommunity.

Parapsychology shares with other sciences a number offeatures. The database comes from experiments using controlledprocedures, double-blind techniques where applicable, the latestand most sophisticated apparatus, and sophisticated statisticalanalysis. In addition, the findings are reported at annualmeetings and in refereed journals.

Unfortunately, as I have pointed out elsewhere, parapsychologyhas other characteristics that make its status as a normalscience problematic. Here I will list only a few. These are worthmentioning because they impinge upon the assessment of thescientific status of the SAIC program. Probably the mostfrequently discussed problem is the issue of replicability. Bothcritics and parapsychologists have agreed that the lack ofconsistently replicable results has been a major reason forparapsychology's failure to achieve acceptance by the scientificestablishment.

Some parapsychologists have urged their colleagues to refrainfrom demanding such acceptance until they can put examples ofreplicable experiments before the scientific community. The lateparapsychologist J.G. Pratt went further and argued thatparapsychology would never develop a replicable experiment. Heargued that psi was real but would forever elude deliberatecontrol. More recently, the late Honorton claimed that theganzfeld experiments had, indeed, achieved the status of areplicable paradigm. The title of the landmark paper in theJanuary 1994 issue of the Psychological Bulletin by Bemand Honorton is "Does psi exist? Replicable evidence for ananomalous process of information transfer." In her positionpaper "Replication and meta-analysis in parapsychology"(Statistical Science, 1991, 6, pp. 363-403), Jessica Uttsreviews the evidence from meta-analyses of parapsychologicalresearch to argue that replication has been demonstrated and thatthe overall evidence indicates that there is an anomalous effectin need of explanation.

In evaluating the SAIC research, Utts points to theconsistency of effect sizes produced by the expert viewers acrossexperiments as well as the apparent consistency of average effectsizes of the SRI and SAIC experiments with those from otherparapsychological laboratories. These consistencies in effectsizes across experiments and laboratories, in her opinion,justify the claim that anomalous mental phenomena can be reliablyreplicated with appropriately designed experiments. This is animportant breakthrough for parapsychology, if it is true.However, to anticipate some of my later commentary, I wish toemphasize that simply replicating effect size is not the samething as showing the repeated occurrence of anomalous mentalphenomena. Effect size is nothing more than a standardizeddifference between an observed and an expected outcomehypothesized on the basis of an idealized probability model. Anindefinite number of factors can cause departures from theidealized probability model. An investigator needs to go wellbeyond the mere demonstration that effect sizes are the samebefore he/she can legitimately claim that they are caused by thesame underlying phenomenon.

In my opinion, a more serious challenge to parapsychology'squest for scientific status is the lack of cumulativeness in itsdatabase. Only parapsychology, among the fields of inquiryclaiming scientific status, lacks a cumulative database. Physicshas changed dramatically since Newton conducted his famousexperiment using prisms to show that white light contained allthe colors of the spectrum. Yet, Newton's experiment is stillvalid and still yields the same results. Psychology has changedits ideas about the nature of memory since Ebbinghaus conductedhis famous experiments on the curve of forgetting in the 1880s.We believe that memory is more dynamic and complicated than canbe captured by Ebbinghaus' ideas about a passive, rote memorysystem. Nevertheless, his findings still can be replicated andthey form an important part of our database on memory.

Parapsychology, unlike the other sciences, has a shiftingdatabase. Experimental data that one generation puts forth asrock-solid evidence for psi is discarded by later generations infavor of new data. When the Society for Psychical Research wasfounded in 1882, its first president Henry Sidgwick, pointed tothe experiments with the Creery sisters as the evidence thatshould convince even the most hardened skeptic of the reality ofpsi. Soon, he and the other members of the Society argued thatthe data from Smith-Blackburn experiments provided thefraud-proof case for the reality of telepathy. The nextgeneration of psychical researchers, however, cast aside thesecases as defective and we no longer hear about them. Instead,they turned to new data to argue their case.

During the 1930s and 1940s, the results of Rhine's cardguessing experiments were offered as the solid evidence for thereality of psi. The next generation dropped Rhine's data as beingflawed and difficult to replicate and it hailed the Soal-Goldneyexperiments as the replicable and rock-solid basis for theexistence of telepathy. Next came the Sheep-Goat experiments.Today, the Rhine data, the Sheep-Goats experiments, and theSoal-Goldney experiments no longer are used to argue the case forpsi. Contemporary parapsychologists, instead, point to theganzfeld experiments, the random-number generator experiments,and--with the declassifying of the SAIC experiments--the remoteviewing experiments as their basis for insisting that psi exists.

Professor Utts uses the ganzfeld data and the SAIC remoteviewing results to assert that the existence of anomalouscognition has been proven. She does not completely discardearlier data. She cites meta-analyses of some of the earlierparapsychology experiments. Still, the cumulative database foranomalous mental phenomena does not exist. Most of the dataaccumulated by previous investigators has been discarded. In mostcases the data have been discarded for good reasons. They weresubsequently discovered to be seriously flawed in one or moreways that was not recognized by the original investigators. Yet,at the time they were part of the database, the parapsychologistswere certain that they offered incontestable evidence for thereality of psi.

How does this discussion relate to our present concerns withthe scientific status of the SAIC program? This consideration ofthe shifting database of parapsychology offers a cautionary noteto the use of contemporary research on the ganzfeld and remoteviewing as solid evidence for anomalous mental phenomena. Morethan a century of parapsychological research teaches us that eachgeneration of investigators was sure that it had found the `HolyGrail'--the indisputable evidence for psychic functioning. Eachsubsequent generation has abandoned their predecessors' evidenceas defective in one way or another. Instead, the new generationhad its own version of the holy grail.

Today, the parapsychologists offer us the ganzfeld experimentsand, along with Jessica Utts, will presumably will include theSAIC remote viewing experiments as today's reasons for concludingthat anomalous cognition has been demonstrated. Maybe thisgeneration is correct. Maybe, this time the"indisputable" evidence will remain indisputable forsubsequent generations. However, it is too soon to tell. Onlyhistory will reveal the answer. As E.G. Boring once wrote, whenwriting about the Soal-Goldney experiments, you cannot hurryhistory.

Meanwhile, as I will point out later in this report, there arehints and suggestions that history may repeat itself. Where Uttssees consistency and incontestable proof, I see inconsistency andhints that all is not as rock-solid as she implies.

I can list other reasons to suggest that parapsychology'sstatus as a science is shaky, at best. Some of these reasons willemerge as I discuss specific aspects of the SAIC results andtheir relation to other contemporary parapsychological research.

THE CLAIM THAT ANOMALOUS COGNITION EXISTS

Professor Utts concludes that "psychic functioning hasbeen well established." She bases this conclusion on threeother claims: 1) the statistical results of the SAIC and otherparapsychological experiments "are far beyond what isexpected by chance" ; 2) "arguments that these resultscould be due to methodological flaws are soundly refuted" ;and 3) "Effects of similar magnitude to those found ingovernment-sponsored research at SRI and SAIC have beenreplicated at a number of laboratories across the world."

Later, in this report, I will raise questions about her majorconclusion and the three supporting claims. In this section, Iwant to unpack just what these claims entail. I will start withthe statistical findings. Parapsychological is unique among thesciences in relying solely on significant departures from achance baseline to establish the presence of its allegedphenomenon. In the other sciences the defining phenomena can bereliably observed and do not require indirect statisticalmeasures to justify their existence. Indeed, each branch ofscience began with phenomena that could be observed directly.Gilbert began the study of magnetism by systematically studying aphenomenon that had been observed and was known to the ancientsas well as his contemporaries. Modern physics began by becomingmore systematic about moving objects and falling bodies.Psychology became a systematic science by looking for lawfulrelationships among sensory discriminations. Another startingpoint was the discovery of lawful relationships in theremembering and forgetting of verbal materials. Note that in noneof these cases was the existence of the defining phenomena inquestion. No one required statistical tests and effect sizes todecide if magnetism was present or if a body had fallen.Psychophysicists did not need to reject a null hypothesis todecide if sensory processes were operating and memory researchersdid not have to rely on reaching accepted levels of significanceto know if recall or forgetting had occurred.

Each of the major sciences began with phenomena whose presencewas not in question. The existence of the primary phenomena wasnever in question. Each science began by finding systematic relationshipsamong variations in the magnitudes of attributes of thecentral phenomena and the attributes of independent variablessuch as time, location, etc. The questions for the investigationof memory had to do with how best to describe the forgettingcurve and what factors affected its parameters. No statisticaltests or determination of effect sizes were required to decideif, in fact, forgetting was or was not present on any particularoccasion.

Only parapsychology claims to be a science on the basis ofphenomena (or a phenomenon) whose presence can be detected onlyby rejecting a null hypothesis. To be fair, parapsychologistsalso talk about doing process research where the emphasis is onfinding systematic relationships between attributes of psi andvariations in some independent variable. One conclusion from theSRI/SAIC project, for example, is that there is no relationshipbetween the distance of the target from the viewer and themagnitude of the effect size for anomalous cognition. However, itis still the case that the effect size, and even the question ofwhether anomalous cognition was present in any experiment, isstill a matter of deciding if a departure from a chance base lineis non-accidental.

At this point I think it is worth emphasizing that the use ofstatistical inference to draw conclusions about the nullhypothesis assumes that the underlying probability modeladequately represents the distributions and variations in thereal world situation. The underlying probability model is an idealizationof the empirical situation for which it is being used. Whether ornot the model is appropriate for any given application is anempirical matter and the adequacy of the model has to bejustified for each new application. Empirical studies have shownthat statistical models fit real world situations onlyapproximately. The tails of real-world distributions, forexample, almost always contain more cases than the standardstatistics based on the normal curve assume. These departuresfrom the idealized model do not have much practical import inmany typical statistical applications because the statisticaltests are robust. That is, the departures of the actualsituation from the assumed probability model typically do notdistort the outcome of the statistical test.

However, when statistical tests are used in situations beyondtheir ordinary application, they can result in rejections of thenull hypothesis for reasons other than a presumed departure fromthe expected chance value. Parapsychologists often complain thattheir results fail to replicate because of inadequate power.However, because the underlying probability models are onlyapproximations, too much power can lead to rejections ofthe null hypothesis simply because the real world and theidealized statistical model are not exact matches. Thisdiscussion emphasizes that significant findings can arise formany reasons--including the simple fact that statisticalinference is based on idealized models that mirror the real worldonly approximately.

I agree with Jessica Utts that the effect sizes reported inthe SAIC experiments and in the recent ganzfeld studies probablycannot be dismissed as due to chance. Nor do they appear to beaccounted for by multiple testing, file-drawer distortions,inappropriate statistical testing or other misuse of statisticalinference. I do not rule out the possibility that some of thisapparent departure from the null hypothesis might simply reflectthe failure of the underlying model to be a truly adequate modelof the experimental situation. However, I am willing to assumethat the effect sizes represent true effects beyond inadequaciesin the underlying model. Statistical effects, by themselves, donot justify claiming that anomalous cognition has beendemonstrated--or, for that matter, that an anomaly of any kindhas occurred.

So, I accept Professor Utts' assertion that the statisticalresults of the SAIC and other parapsychological experiments"are far beyond what is expected by chance."Parapsychologists, of course, realize that the truth of thisclaim does not constitute proof of anomalous cognition. Numerousfactors can produce significant statistical results.Operationally, the presence of anomalous cognition is detected bythe elimination of all other possibilities. This reliance on a negativedefinition of its central phenomenon is another liabilitythat parapsychology brings with its attempt to become arecognized science. Essentially, anomalous cognition is claimedto be present whenever statistically significant departures fromthe null hypothesis are observed under conditions that precludethe operation of all mundane causes of these departures. AsBoring once observed, every success in parapsychological researchis a failure. By this he meant that when the investigator or thecritics succeed in finding a scientifically acceptableexplanation for the significant effect the claim for ESP oranomalous cognition has failed.

Having accepted the existence of non-chance effects, the focusnow is upon whether these effects have normal causes. Since thebeginning of psychical research, each claim that psychicfunctioning had been demonstrated was countered by critics whosuggested other reasons for the observed effects. Typicalalternatives that have been suggested to account for the effectshave been fraud, statistical errors, and methodologicalartifacts. In the present discussion I am not considering fraudor statistical errors. This leaves only methodological oversightas the source for a plausible alternative to psychic functioning.Utts has concluded that "arguments that these results couldbe due to methodological flaws are soundly refuted." If sheis correct, then I would have to agree with her bottom line"that psychic functioning has been well established."

Obviously I do not agree that all possibilities foralternative explanations of the non-chance results have beeneliminated. The SAIC experiments are well-designed and theinvestigators have taken pains to eliminate the known weaknessesin previous parapsychological research. In addition, I cannotprovide suitable candidates for what flaws, if any, might bepresent. Just the same, it is impossible in principle to say thatany particular experiment or experimental series is completelyfree from possible flaws. An experimenter cannot control forevery possibility--especially for potential flaws that have notyet been discovered.

At this point, a parapsychologist might protest that such"in principle" arguments can always be raised againstany findings, no matter how well conceived was the study fromwhich they emerged. Such a response is understandable, but Ibelieve my caution is reasonable in this particular case.Historically, many cases of evidence for psi were proffered onthe grounds that they came from experiments of impeccablemethodological design. Only subsequently, sometimes by fortunateaccident, did the possibility of a serious flaw or alternativeexplanation of the results become available. The founders of theSociety for Psychical Research believed that the Smith-Blackburnexperiments afforded no alternative to the conclusion thattelepathy was involved. They could conceive of no mundaneexplanation. Then Blackburn confessed and explained in detailjust how he and Smith had tricked the investigators.

The critics became suspicious of the Soal-Goldney findings notonly because the results were too good, but also because Soallost the original records under suspicious circumstances. Hansel,Scott, and Price each generated elaborate scenarios to explainhow Soal might have cheated. Hansel and Scott reported findingpeculiar patterns in the data. The scenarios, for accounting forthese data, however, were extremely complicated and required thecollusion of several individuals--some of whom were prominentstatesmen and academics. The discovery of how Soal actually hadcheated was made by the parapsychologist Betty Markwick. Thefinding came about through fortuitous circumstances. The methodof cheating turned out to involve only one person and employed aningenious, but simple, method that none of the critics hadanticipated.

During the first four years of the original ganzfeld-psiexperiments, the investigators asserted that their findingsdemonstrated psi because the experimental design precluded anynormal alternative. Only after I and a couple ofparapsychologists independently pointed out how the use of asingle set of targets could provide a mundane alternative topsychic communication did the ganzfeld experimenters realize theexistence of this flaw. After careful and lengthy scrutiny of theganzfeld database, I was able to generate a lengthy list ofpotential flaws.

Honorton and his colleagues devised the autoganzfeldexperiments. These experiments were deliberately designed topreclude the flaws that I and others had eventually discovered inthe original ganzfeld database. When the statisticallysignificant results emerged from these latter experiments, theywere proclaimed to be proof of anomalous communication becauseall alternative mundane explanations had been eliminated. When Iwas first confronted with these findings, I had to admit that theinvestigators had eliminated all but one of the flaws that I hadlisted for the original database. For some reason, Honorton andhis colleagues did not seem to consider seriously the necessityof insuring that their randomization procedures were optimal.However, putting this one oversight aside, I could find noobvious loopholes in the experiments as reported.

When I was asked to comment on the paper that Daryl Bem andCharles Honorton wrote for the January 1994 issue of the PsychologicalBulletin, I was able to get much of the raw data fromProfessor Bem. My analyses of that data revealed strong patternsthat, to me, pointed to an artifact of some sort. One pattern,for example, was the finding that all the significant hittingabove chance occurred only on the second or later occurrence of atarget. All the first occurrences of a target yielded resultsconsistent with chance. Although this was a post hoc finding, itwas not the result of a fishing expedition. I deliberately lookedfor such a pattern as an indirect way of checking for theadequacy of the randomization procedures. The pattern was quitestrong and persisted in every breakdown of the data that Itried--by separate investigator, by target type, by individualexperiment, etc. The existence of this pattern by itself does notprove it is the result of an artifact. As expected, Professor Bemseized upon it as another peculiarity of psi. Subsequent tofinding this pattern, I have learned about many other weaknessesin this experiment which could have compromised the results.Robert Morris and his colleagues at the University of Edinburghtook these flaws ,as well as some additional ones that theyuncovered, into account when they designed the ganzfeldreplication experiments.

The point of this discussion is that it takes some time beforewe fully recognize the potential flaws in a newly designedexperimental protocol. In some cases, the discovery of a seriousflaw is the result of a fortuitous occurrence. In other cases,the uncovering of flaws came about only after the new protocolhad been used for a while. Every new experimental design, as isthe case for every new computer program, requires a shakedownperiod and debugging. The problems with any new method or designare not always apparent at first. Obvious flaws may be eliminatedonly to be replaced by more subtle ones.

How does this apply to the SAIC experiments? These experimentswere designed to eliminate the obvious flaws of the previousremote viewing experiments at SRI. Inspection of the protocolindicates that they succeeded in this respect. The new design andmethodology, however, has not had a chance to be used in otherlaboratories or to be properly debugged. Many of the featuresthat could be considered an asset also have possible down sides.I will return to this later in the report when I discuss the useof the same viewers and the same judge across the differentexperiments. For now, I just want to suggest some general groundsfor caution in accepting the claim that all possiblemethodological flaws have been eliminated.

The third warrant for Jessica Utts' conclusion that psi hasbeen proven is that "Effects of similar magnitude to thosefound in government-sponsored research at SRI and SAIC have beenreplicated at a number of laboratories across the world." Iwill discuss this matter below. For now, I will point out thateffects of similar magnitude can occur for several differentreasons. Worse, the average effect size from differentparapsychological research programs is typically a meaninglesscomposite of arbitrary units. As such, these averages do notrepresent meaningful parameters in the real world. For example,Honorton claimed that the autoganzfeld experiments replicated theoriginal ganzfeld experiments because the average effect size forboth databases was approximately identical. This apparentsimilarity in average effect size is meaningless for manyreasons. For one thing, the similarity in size depends upon whichof many possible averages one considers. In the case underconsideration the average effect size was obtained by adding upall the hits and trials for the 28 studies in the database. Oneexperimenter contributed almost half to this total. Otherscontributed in greatly unequal numbers. The average will differif each experimenter's contribution is given equal weight.

In addition, the heterogeneity of effect sizes among separateinvestigators is huge. All the effect sizes, for example, of onethe investigators were negative. Another investigator contributedmostly moderately large effect sizes. If the first investigatorhad contributed more trials to the total, then the average wouldobviously have been lower. Similar problems exist for the averagefrom the autoganzfeld experiments. In these latter experiments,the static targets--which most closely resembled the overwhelmingmajority of targets in the original database--yielded an effectsize of zero. The dynamic targets yielded a highly significantand moderate effect size. Is the correct average effect size forthese experiments based on a composite of the results of thestatic and dynamic targets or should it be based only the dynamictargets?

THE SAIC PROGRAM

As I have indicated, the SAIC experiments are an improvementon both the preceding SRI experiments as well as previousparapsychological investigations. The investigators seem to havetaken pains to insure that randomization of targets forpresentation and for judging was done properly. They haveeliminated the major flaw in original SRI remote viewingexperiments of non-independence in trials for a given viewer.Some of the other features can be considered as improvements butalso as possible problems. In this category I would list the useof the same experienced viewers in many experiments and the useof the same target set across experiments. The major limitationsthat I see in these studies derive from their newness and theirhaving been conducted in secrecy. The newness simply means thatwe have not had sufficient time to debug and to grasp fully boththe strengths and weaknesses of this protocol. The secrecyaggravated this limitation by preventing other investigators fromreviewing and criticizing the experiments from the beginning, andby making it impossible for independent laboratories to replicatethe findings. (1)

The fact that these experiments were conducted in the samelaboratory, with the same basic protocol, using the same viewersacross experiments, the same targets across experiments, and thesame investigators aggravates, rather than alleviates, theproblem of independent replication. If subtle, as-yet-undetectedbias and flaws exist is the protocol, the very consistency ofelements such as targets, viewers, investigators, and proceduresacross experiments enhances the possibility that these flaws willbe compounded.

Making matters even worse is the use of the same judge acrossall experiments. The judging of viewer responses is a criticalfactor in free-response remote viewing experiments. Ed May, theprinciple investigator, as I understand it, has been the solejudge in all the free response experiments. May's rationale forthis unusual procedure was that he is familiar with the responsestyles of the individual viewers. If a viewer, for example, talksabout bridges, May--from his familiarity with this viewer--mightrealize that this viewer uses bridges to refer to any object thatis on water. He could then interpret the response accordingly tomake the appropriate match to a target. Whatever merit thisrationale has, it results in a methodological feature thatviolates some key principles of scientific credibility. One mightargue that the judge, for example, should be blind not only aboutthe correct target but also about who the viewer is. Moreimportant, the scientific community at large will be reluctant toaccept evidence that depends upon the ability of one specificindividual. In this regard, the reliance on the same judge forall free-response experiments is like the experimenter effect. Tothe extent that the results depend upon a particular investigatorthe question of scientific objectivity arises. Scientific proofdepends upon the ability to generate evidence that, in principle,any serious and competent investigator--regardless of his or herpersonality--can observe.

The use of the same judge across experiments further compoundsthe problem of non-independence of the experiments. Here, bothProfessor Utts and I agree. We believe it is important that theremote viewing results be obtainable with different judges.Again, the concern here is that the various factors that aresimilar across experiments, count against their separate findingsas independent evidence for anomalous cognition.

HAS ANOMALOUS COGNITION BEEN PROVEN?

Obviously, I do not believe that the contemporary findings ofparapsychology, including those from the SRI/SAIC program,justify concluding that anomalous mental phenomena have beenproven. Professor Utts and some parapsychologists believeotherwise. I admit that the latest findings should make themoptimistic. The case for psychic functioning seems better than itever has been. The contemporary findings along with the output ofthe SRI/SAIC program do seem to indicate that something beyondodd statistical hiccups is taking place. I also have to admitthat I do not have a ready explanation for these observedeffects. Inexplicable statistical departures from chance,however, are a far cry from compelling evidence for anomalouscognition.

So what would be compelling evidence for the reality ofanomalous cognition? Let's assume that the experimental resultsfrom the SAIC remote viewing experiments continue to hold up.Further assume that along with continued statistical significanceno flaws or mundane alternative possibilities come to light. Wewould then want to ensure that similar results will occur withnew viewers, new target pools, and several independent judges.Finally, to satisfy the normal standards of science, we wouldneed to have the findings successfully replicated in independentlaboratories by other parapsychologists as well asnonparapsychologists.

If the parapsychologists could achieve this state of affairs,we are faced with a possible anomaly, but not necessarilyanomalous cognition. As the parapsychologist John Palmer hasrecognized, parapsychologists will have to go beyonddemonstrating the presence of a statistical anomaly before theycan claim the presence of psychic functioning. This is because,among other things, the existence of a statistical anomaly isdefined negatively. Something is occurring for which we have noobvious or ready explanation. This something may or may not turnout to be paranormal. According to Palmer, parapsychologists willhave to devise a positive theory of the paranormal beforethey will be in a position to claim that the observed anomaliesindicate paranormal functioning.

Without such a positive theory, we have no way of specifyingthe boundary conditions for anomalous mental phenomena. Withoutsuch a theory we have no way of specifying when psi is presentand when it is absent. Because psi or anomalous cognition iscurrently detected only by departures from a null hypothesis allkinds of problems beset the quest for the claim and pursuit ofpsychic functioning. For example, the decline effect, whichwas investigated in one of the SAIC experiments, was once used asan important sign for the presence of psi. J.B. Rhine discoveredthis effect not only in some of his data but in his re-analysesof data collected by earlier investigators. He attached greatimportance to his effect because it existed in data whoseinvestigators neither knew of its existence nor had they beenseeking it. In addition, the decline effect helped Rhine toexplain how seemingly null results really contained evidence forpsi. This is because the decline effect often showed up as anexcess of hitting in the early half of the experiment and as adeficit of hitting in the second half of the experiment. Thesetwo halves, when pooled together over the entire experiment,yielded an overall hit rate consistent with chance.

Although Rhine and other parapsychologists attached greatimportance to the decline effect as a reliable and often hiddensign of the presence of psychic functioning, the reliance on thisindicator unwittingly emphasizes serious problems in theparapsychologist's quest. As the SAIC report on binary codingstates, the decline effect is claimed for a bewildering varietyof possibilities. Some investigators have found a decline effectgoing from the first quarter to the last quarter of each separatescore sheet in their experiment. Other investigators havereported a decline effect as a decrease in hit rate from thefirst half to the second half of the total experiment. Stillothers find a decline effect across separate experiments. Indeed,almost any variation where the direction is from a higher hitrate to a lower hit rate has been offered as evidence for adecline effect. To confuse matters further, some investigatorshave claimed finding evidence for an incline effect.

If the decline effect is a token for the presence of psi, whatshould one conclude when the data, as was the case in the SAICexperiment on binary coding, show a significant departure fromthe null hypothesis but no decline effect? We know whatthe parapsychologist's conclude. As long as they get asignificant effect, they do not interpret the absence of thedecline effect as the absence of psychic functioning. This stateof affairs holds as well for several other effects that have beenput forth as tokens or signs of anomalous mental functioning.Several such signs are listed in the Handbook ofParapsychology [1977, B.B. Wolman, Editor].

Typically, such signs are sought when the attempt to rejectthe ordinary null hypothesis fails. Displacement effectsare frequently invoked. When his attempts to replicate Rhine'sresults failed, Soal was persuaded to re-analyze his data interms of displacement effects. His retrospective analysisuncovered two subjects whose guesses significantly correlatedwith the target one or two places ahead of the intended target.In his subsequent experiments with these two subjects, one kepthitting on the symbol that came after the intended target whilethe other produced significant outcomes only when her guesseswere matched against the symbol that occurred just before theintended target. Negative hitting, increased variability, andother types of departures from the underlying theoreticalprobability model have all been used as hidden signs of thepresence of psychic functioning.

What makes this search for hidden tokens of psi problematic islack of constraints. Any time the original null hypothesis cannotbe rejected, the eager investigator can search through the datafor one or more these markers. When one is found, theinvestigator has not hesitated in offering this as proof of thepresence of psi. However, if the null hypothesis is rejected andnone of these hidden signs of psi can be found in the data, thethe investigator still claims the presence of psi. This createsthe scientifically questionable situation where any significantdeparture from a probability model is used as proof of psi butthe absence of these departures does not count as evidenceagainst the presence of psi.

So, acceptable evidence for the presence of anomalouscognition must be based on a positive theory that tells us whenpsi should and should not be present. Until we have such atheory, the claim that anomalous cognition has been demonstratedis empty. Without such a theory, we might just as well argue thatwhat has been demonstrated is a set of effects--each one ofwhich be the result of an entirely different cause.

Professor Utts implicitly acknowledges some of the precedingargument by using consistency of findings with other laboratoriesas evidence that anomalous cognition has been demonstrated. Ihave already discussed why the apparent consistency in averageeffect size across experiments cannot be used as an argument forconsistency of phenomena across these experiments. To be fair,parapsychologists who argue consistency of phenomena acrossexperiments often go beyond simply pointing to consistency ineffect sizes.

One example is the claim that certain personality correlatesreplicate across experiments. May and his colleagues correctlypoint out, however, that these correlations tend to be low andinconsistent. Recently, parapsychologists have claimed thatextroversion correlates positively with successful performance onanomalous cognition tasks. This was especially claimed to be trueof the ganzfeld experiments. However, the apparently successfulreplication of the autoganzfeld experiments by the Edinburghgroup [under subcontract to the SAIC program] found that theintroverts, if anything, scored higher than the extroverts.

The autoganzfeld experiments produced significant effects onlyfor the dynamic targets. The static targets produced zero effectsize. Yet the bulk of the targets in the original ganzfelddatabase were static and they produced an effect size that wassignificantly greater than the zero effect size of theautoganzfeld experiments [ I was able to demonstrate that therewas adequate power to detect an effect size of the appropriatemagnitude for the static targets in the autoganzfeldexperiments]. Further indication of inconsistency is the SAICexperiment which found that the only the static targets produceda significant effect size, whereas the dynamic targets yielded azero effect size. May and his colleagues speculated that thefailure of the dynamic targets was due to a `bandwidth' that wastoo wide. When they apparently narrowed the bandwidth of thedynamic targets in a second experiment, both dynamic and statictargets did equally well. It is unclear whether this should betaken as evidence for consistency or inconsistency. Note that thehypothesis and claim for the autoganzfeld experiments is thatdynamic targets should be significantly better than static ones.As far as I can tell the original dynamic targets of the ganzfeldexperiments are consistent with an unlimited bandwidth.

Other important inconsistencies exist among the contemporarydatabases. The raison d'�tre for the ganzfeld experimentsis the belief among some parapsychologists that an altered statefacilitates picking up the psi signal because it lowers thenoise-to-signal ratio from external sensory input. The touchstoneof this protocol is the creation of an altered state in thereceiver. This contrasts sharply with the remote viewingexperiments in which the viewer is always in a normal state. Moreimportant is that the ganzfeld researchers believe that they getbest results when each subject serves as his/her own judge. Thoseexperiments in the ganzfeld database that employed both externaljudges and subjects as their own judges found that their resultswere more successful using subjects as their own judges. Thereverse is true in the remote viewing experiments. The remoteviewer experimenters believe that external judges provide muchbetter hit rates than viewer-judges. This difference is even moreextreme in the SAIC remote viewing where a single judge was usedfor all experiments. This judge, who was also the principalinvestigator, believed that he could achieve best results if hedid the judging because of his familiarity with the responsestyles of the individual viewers.

So even if the ganzfeld and the SAIC remote viewingexperiments have achieved significant effects and average effectsizes of approximately the same magnitude, there is no compellingreason to assume they are dealing with the same phenomena orphenomenon. To make such a claim entails showing that the allegedeffect shows the same pattern of relationships in each protocol.Almost certainly, a positive theory of anomalous mental phenomenathat predicts lawful relationships of a recognizable type will benecessary before a serious claim can be made that the samephenomenon is present across different research laboratories andexperiments. Such a positive theory will be necessary also totell us when we are and when we are not in the presence of thisalleged anomalous cognition.

WHAT NEEDS TO BE EXPLAINED?

Professor Utts and many parapsychologists argue that they haveproduced evidence of an anomaly that requires explanation. Theyassert that the statistical effects they have documented cannotbe accounted for in terms of normal scientific principles ormethodological artifact. After reviewing the results from theSAIC experiments in the context of other contemporaryparapsychological research, Utts is confident that more than ananomaly has been demonstrated. She believes the evidence sufficesto conclude that the anomaly establishes the existence of psychicfunctioning.

This evidence for anomalous cognition, according to Utts andthe parapsychologists, meets the standards employed by the othersciences. By this, I think Professor Utts means that in manyareas of scientific inquiry the decision that a real effect hasoccurred is based on rules of statistical inference. Only if thenull hypothesis of no difference between two or more treatmentsis rejected can the investigator claim that the differences arereal in the sense that they are greater than might be expected onthe basis of some baseline variability. According to thisstandard, it seems that the SAIC experiments as well as therecent ganzfeld experiments have yielded effects that cannot bedismissed as the result of normal variability.

While the rejection of the null hypothesis is typically a necessarystep for claiming that an hypothesized effect or relationship hasoccurred, it is never sufficient. Indeed, because theunderlying probability model is only an approximation, everyonerealizes that the null hypothesis is rarely, if ever, strictlytrue. In practice, the investigator hopes that the statisticaltest is sufficiently robust that it will reject the nullhypothesis only for meaningful departures from the nullhypothesis. With sufficient power, the null hypothesis willalmost certainly be rejected in most realistic situations. Thisis because effect sizes will rarely be exactly zero. Even if thetrue effect size is zero in a particular instance, sufficientpower can result in the rejection of the null hypothesis becausethe assumed statistical model will depart from the real-worldsituation in other ways. For most applications of statisticalinference, then, too much power can result in mistakeninferences as well as too little power.

Here we encounter another way in which parapsychologicalinquiry differs from typical scientific inquiry. In thosesciences that rely on statistical inference, they do so as an aidto weeding out effects that could be the result of chancevariability. When effect sizes are very small or if theexperimenter needs to use many more cases than is typical for thefield to obtain significance, the conclusions are often suspect.This is because we know that with enough cases an investigatorwill get a significant result, regardless of whether it ismeaningful or not. Parapsychologists are unique in postulating anull hypothesis that entails a true effect size of zero if psi isnot operating. Any significant outcome, then, becomes evidencefor psi. My concern here is that small effects and otherdepartures from the statistical model can be expected to occur inthe absence of psi. The statistical model is only anapproximation. When power is sufficient and when the statisticaltest is pushed too far, rejections of the null hypothesis arebound to occur. This is another important reason why claiming theexistence of an anomaly based solely on evidence from statisticalinference is problematic.

This is one concern about claiming the existence of an anomalyon the basis of statistical evidence. In the context of thisreport, I see it as a minor concern. As I have indicated, I amwilling to grant Professor Utts' claim that the rejection of thenull hypothesis is probably warranted in connection with the SAICand the ganzfeld databases. I have other concerns. Both have todo with the fact that no other science, so far as I know, woulddraw conclusions about the existence of phenomena solely on thebasis of statistical findings. Although it is consistent withscientific practice to use statistical inference to reject thenull hypothesis, it is not consistent with such practice topostulate the existence of phenomena on this basis alone. Muchmore is required. I will discuss at least two additionalrequirements.

Thomas Kuhn's classic characterization of normal andrevolutionary science has served as the catalyst for manydiscussions about the nature of scientific inquiry. Hepopularized the idea that normal scientific inquiry is guided bywhat he called a paradigm. Later, in the face ofcriticisms, he admitted that he had used the term paradigmto cover several distinct and sometimes contradictory features ofthe scientific process. One of his key uses of the term paradigmwas to refer to the store of exemplars or textbook casesof standard experiments that every field of scientific inquirypossesses. These exemplars are what enable members of ascientific community to quickly learn and share commonprinciples, procedures, methods, and standards. These exemplarsare also the basis for initiating new members into the community.New research is conducted by adapting one or more of the patternsin existing exemplars as guidelines about what constitutesacceptable research in the field under consideration.

Every field of inquiry, including parapsychology, has itsstock of exemplars. In parapsychology these would include theclassic card guessing experiments of J.B. Rhine, the Sheep-Goatexperiments, etc. What is critical here is the strikingdifference between the role of exemplars in parapsychology ascontrasted with their role in all other fields of scientificinquiry. These exemplars not only serve as models of properprocedure, but they also are teaching tools. Students in aparticular field of inquiry can be assigned the task ofreplicating some of these classic experiments. The instructor canmake this assignment with the confident expectation that eachstudent will obtain results consistent with the originalfindings. The physics instructor, for example, can ask novicestudents to try Newton's experiments with colors or Gilbert'sexperiments with magnets. The students who do so will get theexpected results. The psychology instructor can ask novicestudents to repeat Ebbinghaus' experiments on forgetting orPeterson and Peterson's classic experiment on short-term memoryand know that they will observe the same relationships asreported by the original experimenters.

Parapsychology is the only field of scientific inquiry thatdoes not have even one exemplar that can be assigned to studentswith the expectation that they will observe the original results!In every domain of scientific inquiry, with the exceptionof parapsychology, many core exemplars or paradigms existthat will reliably produce the expected, lawful relationships.This is another way of saying that the other domains of inquiryare based upon robust, lawful phenomena whose conditions ofoccurrence can be specified in such a way that even novices willbe able to observe and/or produce them. Parapsychologists do notpossess even one exemplar for which they can confidently specifyconditions that will enable anyone--let alone a novice--toreliably witness the phenomenon.

The situation is worse than I have so far described. Thephenomena that can be observed with the standard exemplars do notrequire sensitive statistical rejections of the null hypothesisbased on many trials to announce their presence. The exemplar inwhich the student uses a prism to break white light into itscomponent colors requires no statistics or complicated inferenceat all. The forgetting curve in the Ebbinghaus experiment,requires nothing more than plotting proportion recalled againsttrial number. Yet, to the extent that parapsychology isapproaching the day when it will possess at least one exemplar ofthis sort, the "observation" of the"phenomenon" will presumably depend upon the indirectuse of statistical inference to document its presence.

In the standard domains of science, this problem of having nota single exemplar for reliably observing its alleged phenomenon,would be taken as a sign that the domain has no centralphenomena. When Soviet scientists announced the discovery ofmitogenetic radiation, some western scientists attempted toreplicate the findings. Some reported success; others reportedmixed results; and many failed entirely to observe the effect.Eventually scientists, including the Soviets, abandoned the questfor mitogenetic radiation. Because no one, including the originaldiscover, could specify conditions under which the phenomenon--ifthere be one--could be observed, the scientific community decidedthat there was nothing to explain other than as-yet-undetectedartifacts. The same story can be told about N-Rays, Polywater,and other candidate phenomena that could not be reliably observedor produced. We cannot explain something for which we do not haveat least some conditions under which we can confidently say itoccurs. Even this is not enough. The alleged phenomenon not onlymust reliably occur at least under some conditions but it alsomust reliably vary in magnitude or other attributes as a functionof other variables. Without this minimal amount oflawfulness, the idea that there is something to explain issenseless. Yet, at best, parapsychology's current claim to havingdemonstrated a form of anomalous cognition rests on thepossibility that it can generate significant differences from thenull hypothesis under conditions that are still not reliablyspecified.

I will suggest one more reason for my belief that it ispremature to try to account for what the SAIC and the ganzfeldexperiments have so far put before us. On the basis of theseexperiments, contemporary parapsychologists claim that they havedemonstrated the existence of an "anomaly." I willgrant them that they have apparently demonstrated that the SAICand the ganzfeld experiments have generated significant effectsizes beyond what we should expect from chance variations. I willfurther admit that, at this writing, I cannot suggest obviousmethodological flaws to account for these significant effects. AsI have previously mentioned, this admission does not mean thatthese experiments are free from subtle biases and potential bugs.The experimental paradigms are too recent and insufficientlyevaluated to know for sure. I can point to departures fromoptimality that might harbor potential flaws--such as the use ofa single judge across the remote viewing experiments, the activecoaching of viewers by the experimenter during judging proceduresin the ganzfeld, my discovery of peculiar patterns of scoring inthe ganzfeld experiments, etc. Having granted that significanteffects do occur in these experiments, I hasten to add thatwithout further evidence, I do not think we can conclude thatthese effects are all due to the same cause--let alone that theyresult from a single phenomenon that is paranormal in origin.

The additional reason for concern is the difference in the useof `anomaly' in this context and how the term `anomaly' is usedin other sciences. In the present context, the parapsychologistsare using the term `anomaly' to refer to apparently inexplicabledepartures from the null hypothesis. These departures areconsidered inexplicable in the sense that apparently all normalreasons for such departures from the null hypothesis have beenexcluded. But these departures are not lawful in the sense thatthe effect sizes are consistent. The effect sizes differ amongviewers and subjects; they also differ for differentexperimenters; they come and go in inexplicable ways within thesame subject. Possibly some of these variations in effect sizewill be found to exhibit some lawfulness in the sense that theywill correlate with other variables. The SAIC investigators, forexample, hope they have found such correlates in the entropy andbandwidth of targets. At the moment this is just a hope.

The term `anomaly' is used in a much more restricted sense inthe other sciences. Typically an anomaly refers to a lawful andprecise departure from a theoretical baseline. As such it issomething the requires explaining. Astronomers were faced with apossible anomaly when discrepancies from Newtonian theory werereported in the orbit of Uranus. In the middle 1800s, UrbanLeverrier decided to investigate this problem. He reviewed allthe data on previous sightings of Uranus--both before and afterit had been discovered as new planet. On the basis of theprevious sightings, he laboriously recalculated the orbital pathbased on Newtonian theory and the reported coordinates. Sureenough, he found errors in the original calculations. When hecorrected for these errors, the apparent discrepancy in Uranus'orbit was much reduced. But the newly revised orbit was stilldiscrepant from where it should be on Newtonian theory. With thiscareful work, Leverrier had transformed a potential anomaly intoan actual anomaly. Anomaly in this sense meant a precise andlawful departure from a well-defined theory. It was only afterthe precise nature, direction, and magnitude of this discrepancywas carefully specified did Leverrier and the scientificcommunity decide that here was an anomaly that requiredexplanation. What had to explain was quite precise. What wasneeded was an explanation that exactly accounted for thisspecific departure from the currently accepted theory.

Leverrier's solution was to postulate a new planet beyond theorbit of Uranus. This was no easy task because it involved therelatively unconstrained and difficult problem of inverseperturbations. Leverrier had to decide on a size, orbit,location, and other attributes of a hitherto unknown body whosecharacteristics would be just those to produce the observedeffects on Uranus without affecting the known orbit of Saturn.Leverrier's calculations resulted in his predicting the locationof this hitherto unknown planet and the astronomer Galle locatedthis new planet, Neptune, close to where Leverrier had said itwould be.

The point of this story is to emphasize the distinctionbetween the parapsychologists' use of anomaly from that of otherscientists'. Anomalies in most domains of scientific inquiry arecarefully specified deviations from a formal theory. What needsto be explained or accounted for is precisely described. Theanomalies that parapsychologists are currently talking aboutdiffer from this standard meaning in that the departures are fromthe general statistical model and are far from having the statusof carefully specified and precise deviations from a theoreticalbaseline. In this latter case we do not know what it is that weare being asked to explain. Under what conditions can we reliablyobserve it? What theoretical baselines are the results adeparture from? How much and in what direction and form do thedepartures exist? What specifically must our explanation accountfor?

Finally, I should add that some parapsychologists, at least inthe recent past, have agreed with my position thatparapsychological results are not yet ready to be placed beforethe scientific community. Parapsychologists such as Beloff,Martin Johnson, Gardner Murphy, J.G. Pratt and others havecomplained that parapsychological data are volatile and messy.Some of these investigators have urged their colleagues to firstget their house in order before they ask the scientific communityat large to take them seriously. Martin Johnson, especially, hasurged his colleagues to refrain from asking the scientificcommunity to accept their findings until they can tame them andproduce lawful results under specified conditions. Clearly,parapsychology has still not reached this desired state. At best,the results of the SAIC experiments combined with othercontemporary findings offer hope that the parapsychologists maybe getting closer to the day when they can put something beforethe scientific community and challenge it to provide anexplanation.

POTENTIALS FOR OPERATIONAL APPLICATIONS

It may seem obvious that the utility of remote viewing forintelligence gathering should depend upon its scientificvalidity. If the scientific research cannot confirm the existenceof a remote viewing ability, then it would seem to be pointlessto try an use this non-existent ability for any practicalapplication. However, the matter is not this simple. If thescientific research confirms the existence of anomalouscognition, this does not guarantee that this ability would haveuseful applications. Ed May, in his presentation to theevaluation panel, gave several reasons why remote viewing couldbe real and, yet, not helpful for intelligence gathering. In hisopinion, approximately 20 percent of the information supplied bya viewer is accurate. Unfortunately, at the time the remoteviewer is generating the information, we have no way of decidingwhich portion is likely to be the accurate one. Another problemis that the viewer's information could be accurate, yet notrelevant for the intelligence analyst's purposes.

This question is related to the problem of boundary conditionswhich I discussed earlier in this report. From both a scientificand an operational viewpoint the claim that anomalous cognitionexists is not very credible until we have ways to specify whenand when it is not present. So far, parapsychology seems to haveconcentrated only in finding ways to document the existence ofanomalous cognition. The result is a patchwork quilt of markersthat, when present, are offered as evidence for the presence ofpsi. These markers or indicators include the decline effect,negative hitting as well as positive hitting, displacementhitting, the incline effect, increased variability, decreasedvariability and just about any other way a discrepancy from aprobability model can occur. A cynic will note that the absenceof any or most of these markers is not used as evidence for the absenceof psi. This lack of way to distinguish between the presenceand absence of anomalous cognition creates many challenges forparapsychology, some of which I have already discussed.

So, even if remote viewing is a real ability possessed by someindividuals, its usefulness for intelligence gathering isquestionable. If May is correct, then 80% of the all theinformation supplied by this talented viewer will be erroneous.Without any way to tell which statements of the views arereliable and which are not, the use of this information may makematters worse rather than better.

Can remote viewing have utility for information gathering evenif it cannot be scientifically validated? I can imagine somepossibilities for remote viewing to be an asset to theintelligence analyst even when the viewer possesses no validparanormal powers. The viewer might be a person of uncommonlygood sense or have a background that enables him or her toprovide helpful information even if it does not come from aparanormal source. Another possibility is that the viewer, eventhough lacking in any truly accurate intelligence information,might say things or open up new ways of dealing with theanalyst's problem. In this latter scenario the remote viewer is acatalyst that may open up new ways of looking at an intelligencesituation much like programs for problem solving and creativethinking stimulate new ways of looking at a situation. However,if the usefulness of the remote viewer reduces to a matter ofinjecting common sense or new perspectives into the situation, Ibelieve that we can accomplish the same purpose in more efficientways.

In considering potential utility, I am most concerned aboutseparation of the operational program in remote viewing from theresearch and development phase. By default, the assessment of theusefulness of the remote viewing in the operational arena isdecided entirely by subjective validation or what May and Uttscall prima facie evidence. Granted it is difficult toassess adequately the effectiveness of remote viewing in theoperational domain. Nevertheless, better ways can be devised thanhave apparently been used up to now. In our current attempt toget an initial idea about the effectiveness of the currentoperational use of remote viewing, we have simply been askingindividuals and agencies who have used the services of the remoteviewers, if the information they received was accurate anduseful. Whatever information we get from this survey is extremelylimited for the purposes of judging the utility of remote viewingin the operational domain.

Even psychologists who should know better underrate the powerof subjective validation. Anyone who relies on prima facieevidence as a basis for affirming the validity of remote viewingshould carefully read that portion of Marks and Kamman's ThePsychology of the Psychic [1981] in which they discuss theSRI and their own experiments on remote viewing. In the earlystages of their attempt to replicate the SRI remote viewingexperiments, they were astonished at the high quality of theirsubject's protocols and the apparent accuracy of the viewing.After each session, the experimenters and the subject (viewer)would visit the target site and compare the verbal protocol withthe actual site. The specific details of the viewers' responsesappeared to match specific objects in the target site withuncanny accuracy. When they gave the verbal protocols to thejudge, a distinguished professor, to blindly match against theactual target sites, he was astonished at how well what heconsidered the closest matching protocol for each site matchedactual details of the target. He had no doubt that the viewershad demonstrated strong remote viewing abilities.

So, both the viewers and the judge quickly became convinced ofthe reality of remote viewing on the basis of the uncanny matchesbetween the verbal descriptions and the actual target sites. Theexperimenters received a rude awakening when they discoveredthat, despite the striking matches observed between target andverbal description, the judge had matched the verbal protocols tothe wrong target sites. When all parties were given the resultsthe subjects could not understand how the judge could havematched any but the actual target site to their descriptions. Forthem the match was so obvious that it would be impossible for thejudge to have missed it. The judge, on the other hand, could notaccept that any but the matches he made could be paired with theactual target sites.

This phenomenon of subjective validation is pervasive,compelling and powerful. Psychologists have demonstrated it in avariety of settings. I have demonstrated it and written about inthe context of the psychic reading. In the present context,subjective validation comes about when a person evaluates thesimilarity between a relatively rich verbal description and anactual target or situation. Inevitably, many matches will befound. Once the verbal description has been judged to be a goodmatch to a given target, the description gets locked in and itbecomes virtually impossible for the judge to see the descriptionas fitting any but the original target.

Unfortunately, all the so-called prima facie evidenceput before us is tainted by subjective validation. We are toldthat the many details supplied by the viewers were indeedinaccurate. But some details were uncannily correct and even, inone case, hidden code words were correctly revealed. Suchaccounts do indeed seem compelling. They have to be put in thecontext, however, of all such operational attempts. We have toknow the general background and expectations of the viewers, thequestioners, etc. Obviously, the targets selected for the viewersin the operational setting will have military and intelligencerelevance. If the viewer [some of the viewers have intelligencebackgrounds] suspects the general nature of the target, thenprevious background knowledge might very well make the presence,say of a gantry, highly likely. In addition, the interactions andquestioning of the viewers in these settings appear to be highlysuggestive and leading.

I can imagine that the preceding paragraph might strike areader as being unreasonable. Even allowing for subjectivevalidation, the possibility that a viewer might accurately comeup with secret code words and a detailed description ofparticular gantry is quite remote on the basis of common senseand sophisticated guessing. I understand the complaint and Irealize the reluctance to dismiss such evidence out of hand.However, I have had experience with similarly compelling primafacie evidence for more than a chance match between adescription and a target. In the cases I have in mind, however,the double blind controls were used to pair descriptions with thetrue as well as with the wrong target sites. In all these testcases with which I am familiar, the unwitting subjects found thematches between their descriptions and the presumed targetequally compelling regardless of whether the presumed target wasthe actual or the wrong one.

What this says about operational effectiveness, is that, forevaluation purposes, half of the time the viewers and the judgesshould be mislead about the what was the actual target. In thesecases, both the interrogator and the viewer, as well as thejudge, have to be blind to the actual targets. Under suchconditions, if the judges and the others find the matches betweenthe verbal descriptions and the actual targets consistentlybetter than the matches between the verbal descriptions and thedecoy targets, then this would constitute some evidence for theeffectiveness of remote viewing. I can confidently predict,regardless of the outcome of such an evaluation, that many of theverbal descriptions when matched with decoy targets will bejudged to be uncanny matches.

SUGGESTIONS: WHAT NEXT?

I have played the devil's advocate in this report. I haveargued that the case for the existence of anomalous cognition isstill shaky, at best. On the other hand, I want to state that Ibelieve that the SAIC experiments as well as the contemporaryganzfeld experiments display methodological and statisticalsophistication well above previous parapsychological research.Despite better controls and careful use of statistical inference,the investigators seem to be getting significant results that donot appear to derive from the more obvious flaws of previousresearch. I have argued that this does not justify concludingthat anomalous cognition has been demonstrated. However, it doessuggest that it might be worthwhile to allocate some resourcestoward seeing whether these findings can be independentlyreplicated. If so, then it will be time to reassess if it isworth pursuing the task of determining if these effects do indeedreflect the operation of anomalous cognition. This latter questwill involve finding lawful relationships between attributes ofthis hypothesized phenomenon and different independent variables.Both the scientific and operational value of such an allegedphenomenon will depend upon how well the conditions for itsoccurrence can be specified and how well its functioning can bebrought under control.

Both Professor Utts and I agree that the very firstconsideration is to see if the SAIC remote viewing results willstill be significant when independent judges are used. Iunderstand Ed May's desire to use a judge who is very familiarwith the response styles of the experienced viewers. However, ifremote viewing is real, then conscientious judges, who are blindto the actual targets, should still be able to match the verbaldescriptions to the actual targets better than chance. If thiscannot be done, the viability of the case for remote viewingbecomes problematical. On the other hand, assuming thatindependent judges can match the descriptions to the correcttargets reasonably well, then it becomes worthwhile to try toindependently replicate the SAIC experiments.

At this point we face some interesting questions. Should wetry to replicate the remote viewing studies by using the sameviewers, the same targets, and the same protocol? Perhaps changeonly the experimenters, the judge, and the laboratory? At somepoint we would also want to change the targets. For completeness,we would also want to search for new, talented viewers.

If independent replications confirm the SAIC findings, westill have a long way to go. However, at this stage in theproceedings, the scientific community at large might be willingto acknowledge that an anomaly of some sort has beendemonstrated. Before the scientific community will go beyond thisacknowledgment, the parapsychologists will have to devise apositive theory of anomalous communication from which they canmake testable predictions about relationships between anomalouscommunication and other variables.

CONCLUSIONS

The Scientific Status of the SAIC Research Program

1. The SAIC experiments on anomalous mental phenomena arestatistically and methodologically superior to the earlier SRIremote viewing research as well as to previous parapsychologicalstudies. In particular, the experiments avoided the major flaw ofnon-independent trials for a given viewer. The investigators alsomade sure to avoid the problems of multiple statistical testingthat was characteristic of much previous parapsychologicalresearch.

2. From a scientific viewpoint, the SAIC program was hamperedby its secrecy and the multiple demands placed upon it. Thesecrecy kept the program from benefiting from the checks andbalances that comes from doing research in a public forum.Scrutiny by peers and replication in other laboratories wouldaccelerated the scientific contributions from the program. Themultiple demands placed on the program meant that too many thingswere being investigated with too few resources. As a result, noparticular finding was followed up in sufficient detail to pin itdown scientifically. Ten experiments, no matter how wellconducted, are insufficient to fully resolve one importantquestion, let alone the several that were posed to the SAICinvestigators.

3. Although, I cannot point to any obvious flaws in theexperiments, the experimental program is too recent andinsufficiently evaluated to be sure that flaws and biases havebeen eliminated. Historically, each new paradigm inparapsychology has appeared to its designers and contemporarycritics as relatively flawless. Only subsequently did previouslyunrecognized drawbacks come to light. Just as new computerprograms require a shakedown period before hidden bugs come tolight, each new scientific program requires scrutiny over time inthe public arena before its defects emerge. Some possible sourcesof problems for the SAIC program are its reliance on experiencedviewers, and the use of the same judge--one who is familiar tothe viewers, for all the remote viewing.

4. The statistical departures from chance appear to be toolarge and consistent to attribute to statistical flukes of anysort. Although I cannot dismiss the possibility that theserejections of the null hypothesis might reflect limitations inthe statistical model as an approximation of the experimentalsituation, I tend to agree with Professor Utts that real effectsare occurring in these experiments. Something other thanchance departures from the null hypothesis has occurred in theseexperiments.

5. However, the occurrence of statistical effects does notwarrant the conclusion that psychic functioning has beendemonstrated. Significant departures from the null hypothesis canoccur for several reasons. Without a positive theory of anomalouscognition, we cannot say that these effects are due to a singlecause, let alone claim they reflect anomalous cognition. We donot yet know how replicable these results will be, especially interms of showing consistent relations to other variables. Theinvestigators report findings that they believe show that thedegree of anomalous cognition varies with target entropy and the`bandwidth' of the target set. These findings are preliminary andonly suggestive at this time. Parapsychologists, in the past,have reported finding other correlates of psychic functioningsuch as extroversion, sheep/goats, altered states only to findthat later studies could not replicate them.

6. Professor Utts and the investigators point to what they seeas consistencies between the outcome of contemporary ganzfeldexperiments and the SAIC results. The major consistency issimilarity of average effect sizes across experiments. Suchconsistency is problematical because these average effect sizes,in each case, are the result of arbitrary combinations fromdifferent investigators and conditions. None of these averagescan be justified as estimating a meaningful parameter. Effectsize, by itself, says nothing about its origin. Whereparapsychologists see consistency, I see inconsistency. Theganzfeld studies are premised on the idea that viewers must be inaltered state for successful results. The remote viewing studiesuse viewers in a normal state. The ganzfeld experimenters believethat the viewers should judge the match between their ideationand the target for best results; the remote viewers believe thatindependent judges provide better evidence for psi than viewersjudging their own responses. The recent autoganzfeld studiesfound successful hitting only with dynamic targets and onlychance results with static targets. The SAIC investigators, inone study, found hitting with static targets and not with dynamicones. In a subsequent study they found hitting for both types oftargets. They suggest that they may have solution to thisapparent inconsistency in terms of their concept of bandwidth. Atthis time, this is only suggestive.

7. The challenge to parapsychology, if it hopes toconvincingly claim the discovery of anomalous cognition, is to gobeyond the demonstration of significant effects. Theparapsychologists need to achieve the ability to specifyconditions under which one can reliably witness their allegedphenomenon. They have to show that they can generate lawfulrelationships between attributes of this alleged phenomenon andindependent variables. They have to be able to specify boundaryconditions that will enable us to detect when anomalous cognitionis and is not present.

Suggestions for Future Research

1. Both Professor Utts and I agree that the first step shouldbe to have the SAIC protocols rejudged by independent judges whoare blind to the actual target.

2. Assuming that such independent judging confirms theextra-chance matchings, the findings should be replicated inindependent laboratories. Replication could take several forms.Some of the original viewers from the SAIC experiments could beused. However, it seems desirable to use a new target set andseveral independent judges.

Operational Implications

1. The current default assessment of the operationaleffectiveness of remote viewing is fraught with hazards.Subjective validation is well known to generate compelling, butfalse, convictions that a description matches a target instriking ways. Better, double blind, ways of assessingoperational effectiveness can be used. I suggest at least one wayin the report.

2. The ultimate assessment of the potential utility of remoteviewing for intelligence gathering cannot be separated from thefindings of laboratory research.

------------------

(1) The SAIC did benefit from the input of a distinguished oversight committee. But this still falls far short of what couldhave taken place in an open forum.

FAIR USE COPYRIGHT NOTICE

This article contains copyrighted material that has not been specifically authorized by the copyright owner. MCF is offering this article available to our readers for the general purpose of criticism, comment, news reporting, teaching and/or research. We believe that our use of this material falls under the "fair use" provision of Title 17, Section 107 of the United States Copyright Law. If you wish to use this copyrighted material for purposes other than that provided by law, you must obtain permission from the copyright owner.

Back to MFC Index Back to Hambones Index

Evaluation of Program on Anomalous Mental Phenomena Ray Hyman University of Oregon Eugene, Oregon

Evaluation of Program on Anomalous Mental Phenomena
Ray Hyman
University of Oregon
Eugene, Oregon