From Speech to Text
Before turning to the analysis of the knowledge constructed in the
interview interaction, I will address the transcription of interviews.
Rather than being a simple clerical task, transcription is itself an
interpretative process. Whereas the interaction of the interview situ-
ation has been extensively treated in the literature on method, the
translation from oral conversations to written texts has received less
attention. This chapter addresses the procedures for making interview
conversations accessible to analysis-taping the oral interview inter-
action, transcribing the tapes into written texts, and the use of com-
puter programs to assist the analysis of the interviews. The practical
problems of transcription raise theoretical issues about the differences
between oral and written language, which leads to the rather neglected
position of language in interview research.
Recording Interviews
Methods of recording interviews for documentation and later
analysis include audiotape recording, videotape recording, note tak-
ing, and remembering. The usual way of recording interviews today
is with a tape recorder. The interviewer can then concentrate on the
topic and the dynamics of the interview. The words and their tone,
pauses, and the like, are recorded in a permanent form that can be
returned to again and again for relistening. The audiotape gives a
decontextualized version of the interview, however: Jt does not in-
From Speech to Text 161
elude the visual aspects of the situation, neither the setting nor the
facial and bodily expressions of the participants.
A videotape recorder will encompass the visual aspects of the
interview. With the inclusion of facial expressions and bodily posture,
a videotape provides richer contexts for interpretations than does
audiotape. Video recordings offer a unique opportunity for analyzing
the interpersonal interaction in an interview, an aspect that has led to
extensive use of videos in research on, and training for, therapy.
The wealth of information makes videotape analysis a time-
consuming process. For most interview projects, particularly those
with many interviews and where the main interest is the content of
what is said, video recordings may be too cumbersome for analysis. A
video is useful for the training of interviewers, making them aware of
their facial and bodily expressions during an interview that could
either inhibit or promote communication. The same is true of subtle
ways of reinforcing specific types of answers by nods, smiles, and
bodily postures that the interviewer may not be aware of and that are
not recorded on the audiotape.
It should be noted that the inclusion of the visual setting does not
solve the issue of an objective representation of the interview situ-
ation. Researchers who use videotape recordings are today rather
sensitive to the constructive natures of their documentation, which
are products of the researcher's many choices of angles and framing,
as well as the sequence of shots (see, e.g., Hare1 & Papert, 1991).
An interview may also be recorded through a reflected use of the
researcher's subjectivity and remembering, relying on his or her em-
pathy and memory and then writing down the main aspects of the
interview after the session, sometimes assisted by notes taken during
the interview. There are obvious limitations to a reliance on memory
for interview analysis, such as the rapid forgetting of details and the
influence of a selective memory. The interviewer's immediate memory
will, however, include the visual information of the situation as well
as the social atmosphere and personal interaction, which to a large
extent is lost in the audiotape recording. The interviewer's active
listening and remembering may ideally also work as a selective filter,
retaining those very meanings that are essential for the topic and
purpose of the study.

162 Interviews
While remembering is today often decried as a subjective method
replete with biases, it should not be overlooked that the main empirical
basis of psychoanalytic theory came from the therapist's empathic
listening to and remembering of therapeutic interviews. Freud devel-
oped his psychoanalytical theory at a time when tape recorders did not
exist. He refrained from taking notes during the therapeutic hours and
listened with an even-hovering attention, attended to the meaning of
what was said, and first made notes after the therapeutic session
(Freud, 1963). This form of recollection is based on active listening
during the situation; it requires sensitivity and training, which inter-
view researchers today may forgo, treating the tapes and transcripts as
their real data. One might speculate that if tape recorders had existed
in Freud's time, psychoanalytical theory might not have developed
beyond infinite series of verbatim quotes from the patients, and
psychoanalysis might today have remained confined to a small Vien-
nese sect of psychoanalysts lost in a chaos of tapes and transcriptions
from their therapies.
Taping.In the present context, the most common method of record-
ing interviews today-audiotape recording and subsequent transcrip-
tion-will be treated more extensively. The first requirement for
transcribing a recorded interview is that it was in fact recorded. Some
interviewers have painful memories of an exceptional interview where
nothing got on the tape due to technical faults or, most often, human
error. The interviewer may have been so caught by the newness and
complexities of the interview situation that he or she simply forgot to
turn the recorder on, or a special interview may have been so engaging
that any thought of technicalities was lost.
A second requirement for transcription is that the conversation on
the tape is audible. A good tape recorder and microphone are basic
requirements. So is finding a room without background noise such as
voices in neighboring rooms and heavy outside traffic. To secure good
recording quality it is necessary that the microphone is close enough
to both participants; that the interviewer is not afraid to ask a mum-
bling interviewee to speak up; and that the transcriber's coming work
is kept in mind, for example by avoiding coffee cups and the like hitting
the table, sending bolts of thunder into the transcriber's ears (seeYow
[I9941and Poland [I9951 for more extensive treatments of the record-
ing quality of interviews).
From Speech to Text
Transcription Reliability and Validity
Interviews are today seldom analyzed directly from tape recordings.
The usual procedure for analyzing is to have the taped interviews
transcribed into written texts. Although this seems like an apparently
simple and reasonable procedure, transcriptions involve a series of
methodical and theoretical problems. For example, once the interview
transcriptions are made, they tend to be regarded as the solid empirical
data in the interview project. r, not the
data of intervie construc-Y -
tions from an oral to a written mode of communication. Every tran-
one context to another involves a series of judgments
ions. I will introduce the constructive nature of transcripts
by taking a closer look at their reliability and validity.
Reliability. Questions of interviewer reliability in interview re-
search are frequently raised. Yet in contrast to sociolinguisticresearch,
transcriber reliability is rarely mentioned for social science interviews.
Technically regarded, it is an easy check to have two persons inde-
pendently type the same passage of a taped interview, and then have
a computer program list and count the number of words that differ
between the two transcriptions, thus providing a quantified reliability
check.
The interpretational character of transcription is evident from the
two transcripts of the same tape recording in Table 9.1. The words that
are different in the two transcriptions are italicized. The transcriptions
were made by two psychologists who were instructed to transcribe as
accurately as possible. Still, the transcribers adopted different styles:
Transcriber A appears to write more verbatim, includes more words,
and seems to guess more than transcriber B, who records only what is
clear and distinct, and who also produces a more coherent written
style. The most marked discrepancy between the two is rendering the
interviewer's question as "because you don't get grades?" versus "of
course you don't like grades?" It thereby becomes ambiguous what
the subject's answer-"Yes, I think that's true .. ."-refers to.
The quality of transcriptions can be improved by clear instructions
about the procedures and purposes of the transcriptions, preferably
accompanied by a reliability check. Yet even with detailed typing

164 Interviews
TABLE 9.1 T w o Transcriptions of the Same Interview Passage
Transcription A:
I: And are you also saying because you don't get grades?Is that true?
S: Yes, I think that's true because if I got grades I would work toward the grade as
opposed to working toward. ..umm, expanding what I know, or, pushing a limit
back in myself or, something. ..contributing new ideas ...
Transcription B:
I: And are you also saying that of course you don't like grades?
S: Yes, I think that's true, because if I got grades I would work toward the grade as
opposed to working toward expanding what I know or pushing those limits back. ..
(tape unclear) contributing new ideas.
instructions it may be difficult for two transcribers to reach full
agreement on what was said. Listening again to the tape might show
that some of the differences are due to poor recording quality and
mishearing. Other differences, which are of interest from an inter-
relational perspective, may not be unequivocally solved, as for exam-
ple: Where does a sentence end? Where is there a pause? How long
is a silence before it becomes a pause in a conversation? Does a spe-
cific pause belong to the subject or to the interviewer? And if the
emotional aspects of the conversation are included, for instance "tense
voice," "giggling," "nervous laughter," and so on, the intersubjective
reliability of the transcription could develop into a research project of
its own.
Validity. Ascertaining the validity of the interview transcripts is
more complex than assuring their reliability. The issue of what a valid
transcription is may be exemplified by two different transcriptions of
a story told by a 7-year-old Afro-American pupil (see Table 9.2). The
two transcriptions are from a segment of a longer story from a
classroom exercise, transcribed by two different researchers and dis-
cussed by Mishler (1991).Transcript A is a verbatim rendering of the
oral form of the story; the school teacher found the whole story
disconnected and rambling, not living up to acceptable criteria of
coherence and language use. Transcript B is an idealized realization
of the same story passage, retranscribed into a poetic form by a
researcher familiar with the linguistic practices of black oral style.
From Speech to Text 165
TABLE 9.2 T w o Transcriptions of Leona's Story of Her Puppy
Transcription A:
.. .and then my puppy came / he was asleep I and he was-he was 1
he tried to get up / and he ripped my pants / and he dropped the oatmeal-
all over him / and / my father came / and he said
Transcription B:
an' then my puppy came
he was asleep
he tried to get up
an' he ripped my pants
an' he dropped the oatmeal all over him
an' my father came
an' he said
...
SOURCE: From Mishler (1991).
Here the story appears as a literary tour de force, yielding a remarkable
narrative. Neither transcription is more objective than the other; they
are, rather, different written constructions from the same oral passage:
"Different transcripts are constructions of different worlds, each
designed to fit our particular theoretical assumptions and to allow us
to explore their implications" (Mishler, 1991, p. 271).
Transcribing involves translating from an oral language, with its
own set of rules, to a written language with another set of rules.
Transcripts are not copies or representations of some original reality,
they are interpretative constructions that are useful tools for given
purposes. Transcripts are decontextualized conversations, they are
abstractions, as topographical maps are abstractions from the original
1a"ndscapefrom which they are derived. Maps emphasize some aspects
of the countryside and omit others, the selection of features depending
on the intended use. Maps of the same topographical area for purposes
of driving, aviation, agriculture, and mining will tend to be rather
different. An objective map representing, for example, the island of
Greenland does not exist: The shape depends on the selected mode
of projection from a curved to a flat plane, which again depends on
the intended use of the map.

166 Interviews
Correspondingly, the question "What is the correct transcription?"- -.--- - - -. ----
cannot be answered-there is no true, objective transformation from
the oral to the written mode. A more constructive question is: "What
is a useful transcription for my research purposes?" Thus verbatim
descriptions are necessary for linguistic analyses; the inclusion of
pauses, repetitions, and tone of voice are relevant for psychological
interpretations of, for example, level of anxiety or the meaning of
denials. Transforming the conversation into a literary style facilitates
communication of the meaning of the subject's stories to readers.
Oral and Written Language
By neglecting issues of transcription, the interview researcher's
road to hell becomes paved with transcripts. The interview is an
evolving conversation between two people. The_-transcriptions are
frozen in time and abstracted from their base in a social interaction.
The lived face-to-face conversation becomes fixated into transcripts.
A transcript is a transgression, a transformation of one narrative
mode-oral discourse-into another narrative mode-written dis-
course. To transscribe means to transform, to change from one form
to another. Attempts at verbatim interview transcriptions produce
hybrids, artificial constructs that are adequate to neither the lived oral
conversation nor the formal style of written texts. Transcriptions are
translations from one language to another; what is said in the herme-
neutical tradition of translators also pertains to transcribers: traduire
traittori-translators are traitors.
The different rhetorical forms of oral and written language are
frequently overlooked during the transcription of social science inter-
views; one exception is Poland (1995). Recognizing the socially
constructed nature of the transcript, he discusses in detail procedures
for increasing the trustworthiness of transcripts and thus enhancing
rigor in qualitative research. Sociolinguistics and ethnomethodology
have brought the differences between oral and written language into
focus (Ong, 1982; Tannen, 1990; Tedlock, 1983). In a historical
linguistic study, in particular of Homer's work, Ong outlines the
thought and expression of a primarily oral culture as being close to
the human life world, situational, empathic and participatory, addi-
tive, aggregative, agonistic, and redundant. In contrast, a written
culture is characterized by analytic, abstract, and objectively distanced
forms of thought and expression.
Interview transcriptions are often boring to read, ennui ensues in
face of the repetitions, the incomplete sentences, and the many digres-
sions. The apparently incoherent statements may be coherent within
the context of a living conversation, with vocal intonation, facial
expressions, and body language supporting, giving nuances to, or even
contradicting what is said. Such discrepancies between what is said
and the accompanying bodily expressions are deliberately used in
some forms of comical and ironical statements.
The problems with interview transcripts are due less to the techni-
calities of transcription than to the inherent differences between an
oral and a written mode of discourse. Transcripts are decontextualized
conversations. If one accepts as a main premise of interpretation that
meaning depends on context, then transcripts in isolation make an
impoverished basis for interpretation. An interview takes place in a
context, of which the spatial, temporal, and social dimensions are
immediately given to the participants in the face-to-face conversation,
but not to the out-of-context reader of the transcript. In contrast to a
taped interview, a novel will report the immediate context of a
conversation, including nonverbal communication to the extent the
author finds it relevant for the story he or she wants to tell. Similar
considerations hold for journalistic interviews.
The transcriptions are detemporalized; a living, ongoing conversa-
tion is frozen into a written text. The words of the conversation,
fleeting as the steps of an improvised dance, are fixated into static
written words, open to repeated public inspections. The words of the
transcripts take on a solidity that was not intended in the immediate
conversational context. The flow of conversation, with its open hori-
zon of directions and meanings to be followed up, is replaced by the
fixated, stable written text.
In a conversation we normally have immediate access to the mean-
ing of what the other says. When analyzing zhe interviews, the tape
recording, and in particular the ensuing transcript, tends to become
an opaque screen between the researcher and the original situation.
Attention is drawn to the formal recorded language and the empathi-
cally experienced, lived meanings of the original conversation fade
away; the dried pale flowers in the herbarium replace the fresh

168 Interviews
colorful flowers of the field. The transcripts become a kind of funda-
mental verbal data for interview research, rather than a means to evoke
and revive the personal interaction of the interview situation.
The rather interpretative basis of the transcripts is often forgotten
in the analysis, where the transcripts tend to become a rock-bottom
basis for the ensuing interpretations. Ignorance of the many technical
and theoretical issues of transforming conversations into texts may be
due to a neglect in social science of the linguistic medium of interview
research. Social scientists are today naive users of the language that
their professional practice and research rests on. Although most social
science programs today require courses in statistical analysis of quan-
titative data, even a rudimentary introduction to linguistic analysis of
linguistic, qualitative data is a rarity.
"Not being able to rely on a conception of a stable, universal,
noncontextual, and transparent relation between representation and
reality, and between language and meaning, confronts researchers
with serious and difficult theoretical and methodological problems"
(Mishler, 1991, p. 278). Neglecting linguistic complexities during
transcription from an oral to a written language may be related to a
philosophy of naive realism, with an implicit constancy hypothesis of
some real meaning nuggets remaining constant by their transfer from
one context to another. In contrast, postmodern conceptions of
knowledge emphasize the contextuality of meaning with an intrinsic
relation of meaning and form, and focus on the very ruptures of
communication, the breaks of meaning. The nuances and the differ-
ences, the transformations and discontinuities of meaning become the
very pores of knowledge. Postmodern approaches to knowledge do
not solve the many technical and theoretical issues of transcription.
The emphasis on the linguistic constitution of reality, on the contex-
tuality of meaning, and on knowledge as arising from the transitions
and breaks, however, involves a sensitivity to and a focus on the often
overlooked transcription stage of interview research.
Transcribing Interviews
Transcribing the interviews from an oral to a written mode struc-
tures the interview conversations in a form amenable for closer
analysis. Structuringthe material into texts facilitates an overview and
From Speech to Text 169
is in itself a beginning analysis. The amount and form of transcribing
depends on such factors as the nature of the material and the purpose
of the investigation, the time and money available, and-not to be
forgotten-the availability of a reliable and patient typist. Transcrip-
tion from tape to text involves a seriesof technical and interpretational
issues for which, again, there are few standard rules, but rather a series
of choices to be made.
It is a useful exercise for interviewers to type one or more pilot
interviews themselves. This will sensitize them to the importance of
the acoustic quality of the recording, to paying attention to asking
clear audible questions and getting equally clear answers in the
interview situation. The transcribing experience will also make inter-
viewers aware of some of the many decisions involved in transforming
oral speech to written texts, and it will give an impression of the time
and effort the transcription of an interview requires.
Typing. The time needed to transcribe an interview will depend on
the quality of the recording, the typing experience of the transcriber,
and the demands for detail and exactitude. Transcribing large amounts
of interview material is often a tiresome and stressing job; the stress
can be reduced by securing recordings of high acoustic quality.
For the interviews in the grading study, an experienced secretary
took about 5 hours to type verbatim an interview of 1hour. A l-hour
interview results in 20 to 25 single-spaced pages, depending on the
amount of speech and how it is set up in typing.
Who Should Transcribe? In most studies the tapes are transcribed
by a secretary, who is likely to be more efficient at typing than the
researcher. Investigators who emphasize the modes of communication
and linguistic style may choose to do their own transcribing in order
to secure the many details relevant to their specific analysis. Some have
a typist do a first transcription of all the interviews in a study; then
after reading them through, the researcher goes back and retypes those
interviews, or those parts of the interviews, that will be subjected to
intensive analysis.
Style. There is one basic rule in transcription-state explicitly in
the report how the transcriptions were made. This should preferably
be based on written instructions to the transcribers. If there are several

170 Interviews
transcribers for the interviews of a single study, care should be taken
that they use the same procedures for typing. If this is not done, cross-
comparisons among the interviews will be difficult to make.
Although there is no standard form or code for transcription of
research interviews, there are some standard choices to be made. They
involve such issues as: Should the statements be transcribed verbatim
and word by word, including the often frequent repetitions, or should
the interview be transformed into a more formal, written style?Should
the entire interview be reproduced verbatim, or should the transcriber
condense and summarize some of the parts that have little relevant
information? Should pauses, emphases in intonation, and emotional
expressions like laughter and sighing be included? And if pauses are
to be included, how much detail should be indicated?
There are no correct, standard answers to such questions; the
answers will depend on the intended use of the transcript. One
possible guideline for editing, doing justice to the interviewees, is to
imagine how they themselves would have wanted to formulate their
statements in writing. The transcriber then on behalf of the subjects
translates their oral style into a written form in harmony with the
specific subjects' general modes of expression. The extent of detail in
a transcription will depend on its use; regarding pauses, for example,
it may be sufficient for some purposes simply to note "a short pause"
or "a long pause," whereas for detailed sociolinguistic analyses the
length of a pause will be indicated in milliseconds.
Decisions concerning style of transcription depend on the audience
for which a text is intended. For the investigator, as an aid in remem-
bering the interviews?For the interview subjects, to confirm that their
views are adequately rendered in the interview and possibly also as an
invitation to expand upon what they have said? For a research group
that will make extensive analyses of the interviews, or for critical
colleagues who want to check the basis on which the researcher draws
his or her conclusions? Or for general readers who want some concrete
illustrations from the interviews?
The decisions about style of transcribing depend on the use of the
transcriptions. If they are to give some general impressions of the
subjects' views, rephrasing and condensing of statements may be in
order. Also, if the analysis is to be in a form that categorizes or
condenses the general meaning of what is said, a certain amount of
From Speech t o Text 171
editing of the transcription may be desirable. If, however, the tran-
scriptions are to serve as material for sociolinguistic or psychological
analysis, they need to be in a detailed, verbatim form. Even the many
<'hmVsof an ordinary conversation, disturbing when reading a tran-
script, can be relevant for later analysis: for example, whether the
"hmY'sof the interviewer selectively follow, and thus reinforce, special
types of answers by the subject. And, if psychological interpretations
are to be made, the emotional tone of the conversation should also be
included. Here the very pauses, repetitions, and so forth may yield
important material for interpretation.
In-Jacobsen's (1981)study of the university socialization of students
of Danish and of medicine to their respective professional cultures,
the interviews were transcribed Gerbatim, including the many "hrnns,
"ain't it true," and the like. Jacobsen counted the use of such fillers
by the students of Danish and of medicine, respectively, and found a
markedly more frequent use of "ain't it true'' by the students of
Danish. He interpreted this, together with other indications, as being
in line with the culture of the humanities, in which there is an emphasis
on dialogue with attempts to obtain consensual validation of interpre-
tations, involving appeals to the others, such as "ain't it true." In
contrast, the medical profession is more characterized by lectures as
monologues authoritatively stating nondebatable truths.
The issue of how detailed a transcription should be is also illus-
trated by an interview sequence on competition for grades, which in
Denmark is a negative behavior that many pupils hesitate to admit to:
Interviewer: Does it influence the relationship between the pupils
that the grades are there?
Pupil: No, no-no, one does not look down on anyone who gets bad
grades, that is not done. I do not believe that: well, it may be
that there are some who do it, but I don't.
Interviewer: Does that mean there is no competition in the class?
Pupil: That's right. There is none.
At face value, this pupil says that one does not look down on pupils
with low grades and confirms the interviewer's interpretation that
there is no competition for grades in the class. A critical reading may
lead to the opposite conclusion-the boy himself introduces the

172 Interviews
phenomenon of looking down on pupils with bad grades, first denies
that it occurs, then repeats the denials with three "no"s and four "not"s
in the few lines of his statement. This many denials of looking down
on other pupils might, with the quantitative increases, suddenly lead
to a qualitative change for the reader, and the statement come to mean
the opposite of what was manifestly said. If the above interview
statement had not been transcribed verbatim, but rephrased into a
briefer form such as "One does not look down on others with low
grades nor compete for grades," the reinterpretation of the manifest
meaning of the statement into its opposite could not have taken place.
The effect of multiple negations canceling each other out is used in
literature, in Hamlet, for example:
Hamlet: Madam, how like you this play?
Queen: The lady doth protest too much, methinks. (Hamlet, act 111,
scene 2)
Ethics. Transcription involves ethical issues. The interviews may
treat sensitive topics in which it is important to protect the confiden-
tiality of the subject and of persons and institutions mentioned in the
interview. Along with the necessary and simpler but sometimes for-
gotten tasks goes the need for secure storage of tapes and transcripts,
and of erasing the tapes when they are no longer of use. In sensitive
cases, it may be advantageous as early as the transcription stage to
mask the identities of the interviewed subjects, as well as events and
persons in the interviews that might be easily recognized. This is
particularly important if a larger research group is involved and sev-
eral persons will therefore have access to the transcripts.
Some subjects may experience a shock as a consequence of reading
their own interviews. The verbatim transcribed oral language may
appear as incoherent and confused speech, even as indicating a lower
level of intellectual functioning. The subjects may become offended
and refuse any further cooperation and any use of what they have said.
If the transcripts are to be sent back to the interviewees, rendering
them in a more fluent written style might be considered fromthe start.
And if not, consider accompanying the transcripts with information
about the natural differences between oral and written language styles.
Be mindful that the publication of incoherent and repetitive verbatim
From Speech to Text 173
interview transcripts may involve an unethical stigmatization of spe-
cific persons or groups of people.
Those teachers in the grading study who had expressed interest
received a draft of the book chapter in which their statements were
discussed. A teacher of Danish, who had been quoted extensively,
called and asked me to omit or rephrase his statements in the book.
The rather off-the-cuffverbatim quotes from his interview showed a
very poor Danish used by a teacher of Danish, which he found penible
in his profession. At that time I was little aware of the different rules
for oral and written language and believed that a verbatim transcrip-
tion of the interviews was the most loyal and objective transcription.
I did, however, respect his request and changed his quotes into a
correct written form, which also made them more readable.
Computer Tools for Interview Analysis
During the past decade, computer programs have been developed
to facilitate the analysis of interview transcripts. They replace the
time-demanding cut-and-paste approach to analysis of often hundreds
of pages of paper with "electronic scissors." The programs are aids for
structuring the interview material for further analysis; the task and
responsibility of interpretation still rest with the researcher.
The computer programs serve as textbase managers, storing the
often extensive interview transcripts, and allow for a multitude of
analytic operations (for overviews, see Tesch, 1990; Weitzman &
Miles, 1995; Miles & Huberman's, 1994, appendix gives a short
introduction to choosing among computer programs for qualitative
analysis). The programs allow for such operations as writing memos,
writing reflections on the interviews for later analyses, coding, search-
ing for key words, doing word counts, and making graphic displays.
Some of the programs allow for on-screen coding and note taking
while reading the transcripts.
The most common form of computer analysis today is coding, or
categorization, of the interview statements. The researcher reads
through the transcripts and categorizes the relevant passages; then
with code-and-retrieve programs the coded passages can be retrieved
and inspected again, with options of recoding and of combining codes.