|
3.3 Study Findings
The analysis process employed by all the study participants generally followed
the pattern shown in Figure 11. Reports were selected from the database through
the refinement of keyword queries and by browsing the returned reports by title
or date. A small number of their sampled reports were heavily relied upon, which we refer to as "key" documents. The key documents made up the
skeleton of the analysis product. Excerpts from supporting documents were then
used to corroborate some of the information and fill in details. Conflicts in the
data were flagged and judgments about which data to include in the developing
story were revisited as new information on the topic was discovered. When the
study participants felt ready, they organized their notes and generated a
coherent story.

3.3.1 Search strategy: Sampling documents by narrowing in
In inferential analysis under data overload in baseline electronic environments
with textual databases, information is effectively sampled, generally through
querying and browsing. In our study, participants were observed to begin the
analysis process by making queries with standard inputs such as keywords and
date limits. If a returned set of documents was judged to be too large, the search
was narrowed rather than starting with a new set of search terms. Typical
narrowing strategies included adding a keyword, limiting to a date range, or
enforcing a proximity requirement on a set of keywords. The search was then
further narrowed through the process of browsing by summary information
about a document, typically dates and titles. Then documents were opened by
double-clicking on a report title.
A subset of the opened documents was judged to be relevant to the analysis. Of
this set of documents, a small number were used as the basis for the analysis, which we refer to as "key" documents. For this study, the definition of what documents were treated as "keys" was based on converging behavioral and
verbal data from the process traces. The key documents were associated with verbalizations such as "Here we go!" or "That's a good one!" In addition, the
participants were often observed to spend a longer time reading them than other
documents, copy much of the document to their electronic notes, and/or use the
marking function in the database software to highlight the title in the browsing
window. Convergingly, the phrases used in the verbal briefings provided
evidence for what documents were heavily relied upon in the analysis process.
To illustrate this process, consider the information sampling process employed
by study participant 5 during the analysis (Figure 12). The participant started
with a Boolean keyword search (esa OR (european AND space AND agency)).
This search returned 725 hits, so he narrowed the search to documents published
after June 1, 1996 after determining that the date of the accident was June 4, 1996
from scanning three articles. 419 documents remained after this narrowing
criteria, which became his "home query" in that he did no more keyword
searches. Twenty-eight documents were opened during the analysis, 24 of which
were on-topic, or relevant to the analysis. Six of the documents that he opened
were "high-profit" in that they were judged by the investigators to be highly
informative documents. The other three high-profit documents were available in
the database but were not returned by either query. The participant cut and
pasted portions of eight documents along with references into a word processing
file and used a marking function in the software to highlight two documents, one
because he stated that it was a remarkably good article and one to mark in case
he needed to refer back to it later in the analysis for further information. Three articles were identified as his "key" documents - 1) document 1223 because he remarked that it was "remarkably good" and spent a long time reading it, 2)
document 1301 because he spent a long time reading it and made many
verbalizations about details of the accident while reading it and said after
reading it that now he had a good idea of what had happened, and 3) document
1882 because he said that it was "a definite keeper," that it was like briefings by
professional analysts in its quality, spent a long time reading it, cut and pasted
the most text from it, and made many verbalizations while reading it. All three
of his key documents were high profit documents.
-
The information sampling strategy for study participant 5 was essentially one of
continually narrowing in. An initial query was refined to reach a document set
that was judged manageable based on the number of hits. A small subset of these
documents was then heavily relied upon in generating the analysis product.
Looking at the searching processes for all of the study participants (Figure 13),
this process was representative. All of the participants narrowed their queries to a number that they judged to be manageable (22 - 419 documents) from which they opened documents based on a view of the dates and titles (4 - 29
documents). They then relied heavily on a subset of these documents (1-4
documents) for their verbal briefings.

This pattern suggests that, under data overload conditions, narrowing in on a
small subset of information is a commonly used coping strategy. Others have
observed this propensity to narrow returned sets based on the number of hits
almost indiscriminately when the data sets are large (Blair, 1980 observed this
pattern with users of indexed databases and explained the pattern as a result of
overestimating the probability of conjunctive sets; Olsen, Sochats, and Williams,
1998 discuss the overuse of adding keyword terms to narrow document sets).
Although effective in making the amount of data to be browsed manageable, this
coping strategy leaves analysts vulnerable to missing critical information, such as
the high profit documents not opened by the study participants.
The narrowing strategies employed by the participants are relatively primitive
compared to tactics described in the information retrieval literature (see Bates,
1979 for search tactics to narrow the number of documents that are returned by a
query). The emphasis appears to be on quickly getting to a number of
documents that can be browsed rather than seeking a high quality, precise, or
exhaustive set of information. For example, the participants did not use
orthogonal facets to narrow the number of returned hits. This strategy would
involve combining synonyms with an "OR" command crossed by orthogonal
facets with an "AND" command. Instead, some of the terms that were used to
narrow the search were synonyms, such as when fail* was ANDed with a query
combination that already included (destr* OR explo*) by participant 6.
The finding that the study participants used relatively primitive search strategies
is not surprising in the context of the growing information retrieval literature on
other domain expert end-users who conduct their own searches but are not
search experts (e.g., legal analysts, Blair and Maron, 1985). Across a number of
studies, there is converging evidence that although domain experts can quickly
learn to conduct simple searches, many never learn to employ more
sophisticated search techniques.
One caution in determining implications for this finding is that this does not
necessarily imply that all intelligence analysts should use professional search
intermediaries to perform their searches. It is a consistent finding in information
retrieval studies that both domain knowledge and search expertise are important
in seeking information, and that one is not significantly more important than the
other (Saracevic, Kantor, Chamis, and Trivison, 1988). Also, these two sources of
knowledge are only partially decomposable, and may in fact interact in
important ways (Shute and Smith, 1992).
It is not surprising, given the type of computer support that was provided to the
participants, that all of the participants missed high profit documents without being aware of it (cf., Blair and Maron's 1985 landmark study of legal analysts
who were poorly calibrated to the amount of relevant information that they were
missing from searching an electronic database). Samples that were returned by
the keyword searches were essentially opaque in terms of how they related to
what was available, such as what high profit documents were left out of the
query results. Then documents were sampled based on a view of the dates and
titles, which were also weak indicators of whether or not documents were high
profit, as can be seen in Table 1 where high profit documents and documents that
were particularly poor quality are indistinguishable. The first "low profit" article
was a translated description of an article originally published in Italy that
contained inaccuracies about the details of the cause of the software failure. The
second article was a one-paragraph abstract and so contained very little
information. The third article contained significant inaccuracies because it was
published soon after the event occurred.

Note that during the process of searching for information, some study
participants verbalized that perhaps they should conduct new searches for
specific information, but did not. In addition, comments made by some of the
study participants indicated that they did not know what was available in the
database and how their queries related to what was available, which made them
uncomfortable. In spite of these statements, the study participants appeared
reluctant to leave the working area that the home query window represented.
The participants developed a familiarity with the titles and dates of the
documents returned by the query, the participant had often sorted the
documents by date, the windows had been resized and placed in a dedicated
place on the screen, and some of the documents had been marked for various
reasons.
3.3.2 Basing analyses on high profit documents
Looking more closely at the process traces in Figure 13, the black circles
represent when the key documents were also high profit documents, or in other
words, when the documents that were heavily relied upon were the best
documents available in the database. Comparing the four participants that used
some high profit documents as key documents vs. the four that did not, there are
some interesting differences between the two groups (Tables 2 and 3). The
participants that used high profit documents as key documents spent more time
during the analysis, read more documents, and read more of the high profit
documents.


We believe that the best explanation for the differences between these two
groups is that the participants who found the high profit documents were more "persistent" in that they took longer and read more documents. It follows that
they were therefore more likely to find the high profit documents. There could
be alternative explanations for the differences between these two groups. It is
generally recognized in the information retrieval literature that both search and
domain expertise is important in information seeking. Therefore, it is possible
that the group of analysts that relied on the high profit documents used more
effective search strategies to find the documents. Similarly, it is possible that the
more experienced professional analysts had developed strategies that helped
them to perceive high profit documents, or that domain- or scenario-related
expertise would make it easier for them to recognize high profit documents. We
investigated nine potential hypotheses relating to these possibilities and found
little support for these alternative explanations (Patterson, Woods, and Roth,
1999).
3.3.3 Impact of basing analyses on high profit documents
An important question to answer is whether the study participants who used the
high profit documents as key documents in their analyses performed better than
those that did not. Although analysts in prior interviews had described that they
considered it critically important to have high-quality documents, it is possible
that they had developed expert strategies that allowed them to use converging
information from lower quality sources in such a way as to perform well despite
having to rely lower quality information.
To this end, the study participants' verbal briefings were coded on 20 topic items
from the Ariane 501 case as accurate, vague, inaccurate, or no information (Table 4) 2. It appears that there might in fact be differences in performance between the
participants who relied upon the high profit documents and the participants who
did not. As would be expected, the participants who relied on high profit
documents in their analysis had fewer inaccurate statements in their verbal
briefings than the other participants who had some of their key documents be
high profit documents (1 vs. 6, p = 0.03). Note that this difference is not
explained by one group of participants having more thorough analyses, thereby
increasing the likelihood of inaccurate statements, because there were no
significant differences between the two groups in the overall number of items
included in the briefings. Also, years of analytic experience is not significantly
different between the groups (11 years vs. 10.5 years).
2 Intercoder reliability by two simultaneous coders was 84% for the eight study participants. The
discrepancies were resolved by discussion and both coders agreed to the final codes.

3.3.4 Sources of inaccurate statements
Two main conceptual frameworks were used to look for patterns in the analytic
processes. The first framework was information sampling strategies, generally
referred to as search tactics in the information retrieval literature. The second
framework was evidence interactions in abductive inference (Josephson and
Josephson, 1994), which is inference to the best explanation. Diagnosis is an
example of a well-known abductive inference process, where a diagnostic
reasoner selects an explanatory hypothesis to explain observed symptoms. The
abductive process involves observing deviations from a nominal state, proposing
explanatory hypotheses to account for the deviations, and selecting the "best" or
most warranted explanation from the hypothesis set.
Determining the cause of the Ariane 501 accident could be characterized as an
abductive inference task. There is anomalous data that could be explained by
several hypotheses (Figure 14). For example, the observation that the rocket
swiveled abnormally could have been due to poor guidance data, a mechanical
failure, or a software failure. The main observation that pointed to a software
failure hypothesis rather than other hypotheses was that both the primary and
backup Inertial Reference Systems (IRS) shut down simultaneously. Although
this finding made the software failure the most plausible explanation, there was
an additional finding that was not covered by this hypothesis -- unexpected roll
torque during ascent. The full set of observations was explained by the combination of two hypotheses - a software failure and an unrelated mechanical
problem.

During the data analysis, we were surprised to discover that there was
remarkably little evidence from the think-aloud protocols and decisions
regarding data conflicts for this traditional abductive inference process. Rather
than gathering a collection of data, determining what hypotheses would explain
the data, and comparing the plausibility for different combinations of hypotheses
in order to come up with a best explanation, the study participants appeared to
be following a different process. The main difference between the theoretical
pattern of abductive inference and the empirical evidence was that the study
participants were not dealing with elemental observations and hypotheses. They were dealing with a "second order" set of data where interpretive frames already
existed in which the report writers assumed particular hypotheses and presented
data mainly in support of these hypotheses. The main task of the study
participant, therefore, was to improve the veracity of the analytic product by
corroborating multiple reports of others who had already performed the task of
mapping explanatory hypotheses to a dynamically changing data set.
Given this situation, the "hypothesis space" for the simulated task was better represented by Figure 15 than Figure 14. Rather than the "elemental" hypotheses
and data given for the Ariane 501 scenario, the think-aloud protocols gave evidence for the study participants dealing at the "second order" level of using
cues from the text, document, and source to evaluate how to resolve data
conflicts. The study participants displayed expertise in recognizing the cues that
were used in evaluating the information and in relating those cues to possible
hypotheses.3

Using the abductive inference framework as a conceptual guide, processes that
resulted in inaccurate statements in the verbal briefings were examined to better
understand the cognitive challenges and potential vulnerabilities. By tracing
why the inaccurate statements were made with the process tracing methodology,
three sources of inaccurate statements were identified that provide insight into
the cognitive demands of inferential analysis under data overload: 1) relying
upon assumptions that would normally be correct, but did not apply in this
situation, 2) repeating information that was inaccurate in a document that they
had read, and 3) relying upon information that was considered accurate at one
point in time, but then was later overturned in subsequent updates.
3 Note that this expertise would probably not be available to surrogate participants such as undergraduate
students.
3.3.4.1 Relying on assumptions that did not apply
One source of inaccurate statements during the analysis process was the study
participants relying on default assumptions that did not apply in this scenario.
There were several inaccurate statements made during the verbal briefings that
did not come from any of the documents that were opened. For the majority of
these cases, the participants appeared to be relying on assumptions to fill in gaps
in the story that did not apply in this case. For example, during the verbal
briefing, one participant stated that the monetary loss of the Cluster satellite
payload could be recovered by insurance. Although payloads are often insured,
in this case the Cluster satellites were not.
Relying on assumptions is clearly a heuristic that can be applied under time
pressure as a coping strategy. Although relying on assumptions led to
inaccurate statements in some instances, in other cases it did not. For example, in
one case, participant 2 used the assumption that the Ariane 5 rocket would
eventually replace the Ariane 4 as the standard launch vehicle in his estimation
of the impacts of the failure. In addition to filling in gaps in knowledge, default
assumptions also proved valuable in knowing what information to seek during
the analysis process. For example, participant 4 stated that he assumed that
there were payloads on the flight and then looked explicitly to see if there were.
3.3.4.2 Incorporating information that was inaccurate
The second main source of inaccurate statements was inaccurate descriptions in
documents in the database. Intelligence analysts clearly view the elimination of
inaccuracies by finding converging evidence across independent sources as a
major component of the value of an analytic product. The participants described
and employed a variety of strategies for tracking and resolving discrepant
descriptions in order to reduce their vulnerability to incorporating inaccurate
information. Partly because this cognitively difficult process of corroborating
information and resolving conflicting information was unsupported by the tools
that they were provided, nearly every participant experienced some breakdowns
in this process. Breakdowns included failing to corroborate information, missing
conflicts in documents that were opened, forgetting how many corroborating
and conflicting descriptions had been read from independent sources, forgetting
the information sources, and treating descriptions that stemmed from the same
source as corroborating (cf., Schum, 1994, evidence interactions in inferential
analysis).
To illustrate some of the difficulties in the process of eliminating inaccuracies,
consider the example of determining the cause for why the rocket swiveled
abnormally. Interestingly, participants 6 and 7 both read the same two
documents that contained discrepant descriptions but ended up with different
outcomes in their verbal briefings (Figures 16 and 17).
Participant 6 based his analysis of why the rocket swiveled mainly on report 858,
which described the cause as a reset of the inertial reference frame following a
numeric overflow (Figure 16). As he read 858, he was verbalizing why the rocket
swiveled based on what he was reading. Later, he read 1385, which had a
contradictory description of why the rocket swiveled. At that point in time,
however, it was the last document that he looked at, and he was focused on a
different issue - why testing did not reveal the software error. He gave no
evidence that he recognized the conflict. In addition, when asked how he knew
when to stop the analytic process, he explained: "It doesn't look like anybody
will have any different opinions. From looking at the other titles, it looks like I
won't come up with anything new."
Therefore, not only did this participant not explicitly conduct the step on this
item of corroborating the information through an independent source; he also
did not recognize a conflict in what he read. This indicates that recognizing
conflicts is a non-trivial task. Direct attention must be given to interpreting that
item of information, remembering what had been read in other articles, and
recognizing that the descriptions are incompatible. In the electronic
environment, this task is particularly challenging because only one report can be
viewed at a time because of space limitations on the computer screen.
Furthermore, the participant was unaware of conflicts in data that he had read,
and as well had no way to tell if there were conflicting descriptions in data that
he had not looked at, or even in the reports that were not returned from his
query but available in the database.

In contrast, participant 7 described the cause of the abnormal rocket swivel as
diagnostic information interpreted as command data (Figure 17). This explanation was incompatible because participant 7's description said that there
was no command data at all because the guidance platforms had shut down whereas participant 6"s description said that there was command data, just that it
was incorrect because the guidance platforms had been reset mid-flight.
Participant 7 recognized the conflict in the descriptions in documents 858 and
1440 and resolved it based on a judgment of source quality. He decided to base
his analysis on the description in 1440 because it was later and therefore more
likely to have all the information, not translated, and from a more authoritative
source. Note, however, that even though this was the accurate judgment to
make, he did not notice that a previously opened article corroborated the
hypothesis that he selected, which would have made the judgment easier. This
would have been particularly helpful in this case because, as he pointed out:
"[The inaccurate description] sounds good." The description that was inaccurate
was written in a way that sounded as if the reporter had sufficient technical
expertise to understand the cause in detail. If he had only read article 858 and
not found the conflicting descriptions, it is likely that he would have believed the
inaccurate description.

It was a surprising finding that most of the study participants did not
consistently employ strategies to reduce inaccuracies in their analytic products
during the simulated task. For example, Guerlain et al. (1999) have described
that expert blood bankers in antibody identification collect independent,
converging evidence to both confirm the presence of hypothesized antibodies
and to rule out all other potential antibodies. When asked, the study participants
described and, in some cases, demonstrated strategies to protect against the
vulnerability of incorporating inaccurate information in their analytic products.
On the whole, however, the study participants did not use or only used greatly
reduced versions of these strategies during the simulated task, and similarly
described that under high workload conditions they tended to do this in the
workplace as well. One likely explanation is that the strategies were highly
resource-intensive, such as printing out and iteratively using highlighter pens on
specific themes to check that information was corroborated from multiple,
independent sources. In addition, these strategies were generally not easy to
perform within the electronic environment. These observations point to design
concepts that would allow the easy manipulation, viewing, and tagging of small
text bundles, as well as aids for identifying, tracking, and revising judgments
about relationships between data.
3.3.4.3 Relying on outdated information
The third source of inaccurate statements was outdated information that once
had been considered correct but then later had been overturned when new
information became available. This type of "inaccurate" information was much
more difficult to detect and resolve than misunderstandings by report writers.
There were descriptions that were considered accurate at one point in time but
that greatly differed from updated descriptions at later points in time. Because
the "findings" or data set on which to base an analysis came in over time, there
was always the possibility of missing information that was released after the
report that was being read that could overturn or render previous information
"stale." This occurred both for descriptions of past events where the information
about the event came in over time as well as for predictions about future events
that changed as new information became available on which to base the
predictions. When these updates occurred on themes that were not central
enough to be included in report titles or newsworthy enough to generate a flurry
of reports, it was very difficult to know if updates had occurred or where to look
for them.
To illustrate how easy it is to fall prey to relying on outdated information,
consider the process that study participant 6 employed (Figure 18) to come to the
conclusion in his verbal briefing that the Cluster satellite program had been
discontinued as a result of the Ariane 501 accident: "The immediate impact were
that the solar wind experiment was destroyed. They couldn't afford to build any
more satellites so they couldn't pursue that anymore." From a global
perspective, this is an inaccurate statement given that later updates overturned
this initial assessment of the impacts and the Cluster satellite program was later
fully reinstated.
Essentially, participant 6 did not open any documents that contained updates on
the impact to the Cluster satellite program. The participant opened seven
documents during the analysis. Only two of the documents contained
descriptions that predicted what the impact to the Cluster satellite program as a
result of the Ariane 501 failure would be. In the first description, a scientist
working on the project directly stated that the project would be discontinued.
While reading this report, the participant verbalized that the scientific mission
was dead and that the experiment was destroyed. The second description was
more vague about the impact and does not directly make any predictions but
could be viewed as weakly converging evidence that the Cluster satellite
program would be discontinued. It is no surprise given this process that the
participant included in the verbal briefing a description similar to the one from
the June 5, 1996 article that the experiment was destroyed and that the program
would no longer be pursued. In this case, the participant employed the strategy
of corroborating information from two independent, authoritative sources
(which would have eliminated the first two sources of inaccuracies),
incorporated it into the analysis, and yet missed later updates that rendered that
information inaccurate.

As a result of basing an analysis on "stale" information that had been turned
over by later updates, study participants made several inaccurate statements at
varying levels of importance. The vulnerability to missing critical information is
particularly troubling because it is so difficult for practitioners to determine
when they have missed critical information. It is the absence of information,
either from not sampling the information or having attention directed on a
different theme while reading a document, that creates the vulnerability.
3.3.5 Summary of Observed Behavior and Design Implications
By observing expert intelligence analysts on a relatively complex, face valid task
using a baseline set of querying and browsing tools similar to what is available to
them in their workplaces, we were able to greatly increase our understanding of
the challenges of intelligence analysis. Under the extreme conditions of a short
timeframe of several hours in a new topic area with a database and question
unfamiliar to the analysts, we observed behaviors across most or all of the study
participants that pointed to design recommendations (Table 5).

First, several of the study participants expressed uneasiness because they were
unaware what was potentially available in the database provided to them. In
addition, an expert analyst provided insight that it was interesting that none of
the study participants explicitly attempted to characterize the database at any
point during the analysis by performing multiple queries to see what was
returned. The desire to evaluate the quality and type of information that is
returned by a query against what is potentially available might explain why all
analysts create personal databases on topic areas for which they are responsible.
When analysts are then asked questions about a new topic area, they lose this
ability to calibrate expectations about what is returned in comparison with what
is potentially available. These observations point to several ideas for design
recommendations. Specifically, information visualizations could be created that
would allow interactive, real-time exploration of the characteristics of subsets of
data. Although there are several software packages that exist that attempt to do
this, the only feedback about the characteristics of the dataset returned in the
tools provided to the study participants was the number of returned hits.
Second, it was observed that all of the study participants narrowed in on a small
portion of the dataset and performed all of their further searches for information
from moving within that space. This observation leads to possible
recommendations to encourage analysts to explore other portions of the
database, either by explicit machine recommendations or through interface
designs that naturally suggest how much of a set of potential data has been
explored. In addition, visualizations that allow easier browsing of larger sets of
documents could make the set that the analysts narrow to be larger and thus
inherently more of the database is covered in the sampling by dates and titles.
Third, it appeared that there was an interaction between the order the documents
were selected from the browser window and the time and effort spent reading
the document. The first or second document that the analyst selected for reading
in detail seemed to frame how the rest of the information was later interpreted.
This observation indicates the importance of quickly providing high quality
documents to an analyst. In addition, when later documents are quickly
browsed, data conflicts, updates, and new information could somehow be
highlighted to reduce the chances it would be missed.
Fourth, although several study participants verbalized that they should conduct
a new search, (s)he did not and appeared reluctant to leave the working area that
the home query window represented. The participants developed a familiarity
with the titles and dates of the documents returned by the query, the participant
had often sorted the documents by date, the windows had been resized and
placed in a dedicated place on the screen, and some of the documents had been
marked for various reasons. This observation leads to recommendations for
passing highlights and "trails" of what had been opened to new query returns.
In addition, query formulation was relatively difficult to manipulate in the
interface, and so performing "what if" changes to a query formulation would
require forming separate queries for each formulation and then comparing the
number of hits returned.
Fifth, the study participants who located "high profit" documents made fewer
inaccurate statements in their verbal briefings than those who found none. If, in
fact, the explanation for the difference between the two groups is the amount of
time and the number of documents, then this indicates that one of the ways,
given a baseline electronic toolset of keyword querying and browsing by dates
and titles, to find the high profit documents in the database might be to cast a
wider net by sampling more, either by performing more queries or by opening
up more documents. Support tools such as "agents" that remind or critique
analysts to be broader in their sampling strategies might be helpful. However,
given the increasing organizational pressures to do analyses more efficiently,
these types of support tools might be ineffective because analysts might not have
the resources to do so. A potentially more viable design intervention to reduce
the vulnerability to missing high profit documents would be to use machine
intelligence as a "recommender" system to suggest likely candidates for high
profit documents. For example, for this scenario, a high profit could be
characterized as: 1) a relatively long document that was released several months
after the original event (and certainly after the Inquiry Board Report was
officially released from the European Space Agency), 2) from a credible source on
rocket launcher and satellite technologies such as Aviation Week and Space
Technology, 3) not an abstract, 4) not reporting information from another news agency (i.e., not "secondhand"), 5) not translated from another language, and
6) a report that had been opened several times by others.
Sixth, some participants missed important events in searching for information,
such as the launch of the next rocket in the series, Ariane 502. It was observed
that, in the documents returned by the study participants' queries, there were
clusters of reports around the time of the 501 rocket launch failure, when the
Inquiry Board Report was released, and the next launch in the series, 502. This
observation led to the idea that disrupting events could be visually emergent
from a display, becoming an implicit cue where an analyst should look for
informative data.
Seventh, breakdowns were observed in the process of resolving discrepancies in
the data, such as failing to identify discrepancies in information that was read
and double-counting information from the same source. These observations led
to the concept of aids for identifying, selecting, manipulating and tracking
judgments about conflicts and corroborations in data.
Eighth, many study participants were observed to devote considerable time and
effort to methodically tracking what document information came from (e.g.,
copying source information in a word processing program into footnotes
associated with text selected from a particular document). In some cases, study
participants were observed to state that they forgot where information came
from or that they were uncertain if information was new or a repeat from
reading the same document again.
Ninth, study participants were observed to make inaccurate statements because
they missed updates that overturned information that was once considered
accurate. In addition, it was observed that many of the study participants had
difficulties in identifying discrepancies in predictions about when events would
occur from text descriptions such as "a few months from now" from one report
at one time and "delayed for several months" from another report at a different
time. This observation led to the notion of visualizing this information on two
parallel timelines, connecting the document date on one timeline with the
predicted event date on another to facilitate recognizing patterns such as
conflicting predictions and slips in predicted times. Aids that would remind
users to search for updates and suggest possible areas to look for updates based
on similarity matches to text descriptions and other attributes could potentially
be very useful.
Finally, we were surprised by the wide variation in answers about accuracy as
estimated by the study participants immediately following their verbal briefings.
It appears that it is extremely difficult to determine a sound basis for a
confidence estimate given that there could always be information that was
missed that would greatly alter or overturn the analysis. Analysts clearly need
support in identifying potential "holes" in the analysis process, due to both
missing information and leaving issues unresolved. Visualizations that represent
the state of the analytic process might help improve analysts' ability to calibrate
their assessment of their accuracy, including displays that show what
information has been sampled and assembled together, as well as information
that has been "tagged" or "bookmarked" as a reminder to return to resolve open
questions.
As we had learned previously in interviews, the observed behavior during the
study indicated that the baseline computer support tools left most of the
challenging tasks in conducting analyses under data overload conditions
unsupported or only weakly supported. We believe that the relationship
between the challenges in inferential analysis based on sampling uncertain and
conflicting data and the support provided by the baseline electronic environment
is likely the primary explanation for these patterns of observed behavior across
study participants. The observed behavior left the study participants open to
making incomplete and inaccurate statements in their verbal briefings. These
observations point to new directions for computerized support for these
processes.
TABLE OF CONTENTS
|