Study Material. Do not distribute. 6 Components, Cores and Cliques One of the most enduring concerns of those who have turned to social network analysis has been the attempt to discover the various 'cliques' and cohesive sub-groups into which a network can be divided. The early researchers of the Hawthorne and Yankee City studies, I have shown, saw the idea of the 'clique' as being their central theoretical discovery. The argument was that people's informal social relations tied them into cohesive sub-groupings that had their own norms, values, orientations and sub-cultures, and that may run counter to the 'official' or formal social structure. The cliques were, they held, among the most important sources of a person's identity and sense of belonging, and their existence was widely recognized in the everyday terms - such as 'our set' and 'the group in the back' - that people used to describe their social world. Once analysts began to try to formalize the idea of the clique and to devise mathematical measures of the number and cohesion of cliques, it was appreciated that the idea was not limited to informal relations. There were also political cliques and factions, economic cliques and interest groups, and so on. It was also recognized that there were a number of different ways of operationalizing the apparently simple idea of the 'clique': for example, cliques could be seen as groups of mutually connected individuals or as pockets of high density. Thus, a number of different theoretical models of subgroups emerged, variously described as 'cliques', 'clusters', 'components', 'cores* and 'circles'. Apart from beginning with the letter 'c', these concepts have very little in common with one another. In this chapter I shall discuss their varying theoretical bases, though I will leave the issue of 'cluster analysis' until the following chapter. The starting point for all of these measures of group structure is the idea of a 'sub-graph'. A sub-graph is any collection of points selected from the whole graph of a network, together with the lines connecting those points. Any aspect of the graph can be chosen for identifying its sub-graphs, though not all of these criteria will be substantively useful in research. A random sample of points, for example, could be treated as a sub-graph and its structural properties could be examined. But a random sub-graph is not, in general, Components, cores and cliques 101 likely to correspond to a meaningful social group. A more useful -r .*!=!(?' approach to the identification of sub-graphs might be to divide the f members of a network by, say, gender and to investigate the separate sub-graphs of men and women. Any such choice will depend on the theoretical and empirical concerns of the researcher. The general aim would simply be to define a meaningful category of agents and to explore their distinct patterns of network formation. ; From this point of view, therefore, the identification of sub-graphs j is no different from the initial identification of the graphs them-,; selves. All the considerations of boundaries and sampling that have been considered in earlier chapters will be equally relevant here and no new issues are involved (see, for example, Frank, : 1978b). j Clique and similar analyses normally adopt an alternative approach to the study of sub-graphs. Their aim has been to y* investigate the structural properties of the whole graph itself in order 1 to discover the 'naturally existing' sub-graphs into which it can be Wt; divided. From this point of view, a sub-graph must have some J defining characteristic drawn from the mathematical principles of graph theory: the connectedness of its points, the intensity of their connection, and so on. It is a sub-graph that is maximal in relation to a particular defining characteristic: it is the largest sub-graph that can be formed in the graph without this defining quality disappear-\ ing. The choice of a particular characteristic depends on the j researcher's decision that a particular mathematical criterion can be i given a meaningful and useful sociological interpretation. Unfortu-! nately, this is rarely made explicit, and far too many researchers assume that whatever mathematical procedures are available in social network analysis programs must, almost by definition, be |SJ1 M useful sociological measures. My aim in this chapter is to uncover the mathematical assumptions of the various available procedures so that researchers can make an informed decision about those that might be relevant to their particular investigations. Components, Cycles and Knots The simplest of the various sub-graph concepts is that of the component, which is formally defined as a 'maximal connected subgraph*. A sub-graph, like a graph, is 'connected' when all of its points are linked to one another through paths: all points in a connected sub-graph can 'reach' one another through one or more paths, but they have no connections outside the sub-graph. Within a component, all points are connected through paths, but no paths ri'si-i run to points outside the component. When the connected sub- 102 Social network analysis Components, cores and cliques 103 graph is maximal, it is impossible to add any new members without destroying the quality of connectedness. Isolated points, for example, cannot be joined with an existing component, as they have no connections to any of its members. The boundary of a component; therefore, is identified by tracing through the paths from its potential members to test for their connectedness. A computer algorithm for identifying components might start from a randomly chosen point and trace all the other points to which it is directly connected. This same procedure can then be repeated for each of these points in turn, and so the component gradually increases in size through a 'snowballing' method. When no further points can be added to the component, its full membership has been identified. If any points remain outside the component, the same procedure can be repeated for them, so as to see what other components can be identified in the graph. Components, then, are sets of points that are linked to one another through continuous chains of connection. The paths connecting points are traced through until the boundaries of the component arc discovered, A 'connected graph', of course, simply comprises a single component. Other graphs typically consist of one or more separate components, together with a number of isolated points (see Figure 6.1). This idea is readily interpretable in sociological terms. The members of a component can, in principle, communicate with one another, either directly or through chains of intermediaries. Isolates, on the other hand, have no such opportunities. The pattern of components found in a graph - their number and size - can, therefore, be taken as an indication of the opportunities and obstacles to communication or the transfer of resources in the associated network. To this extent, then, they embody the ideas behind the 'topological regions' of the early field theorists. A basic step in the structural description of a network, therefore, is to identify the number and size of its components. The simplest of algorithms to detect components in a graph would search all possible paths in order to discover the geodesies between points. The lengths of these geodesies will vary from a minimum of 1 (direct connection) to a maximum of r — 1. In a graph of size 100, for example, the maximum possible path length would be 99. In large graphs, however, the longest geodesic in a component - its 'diameter' - is generally much shorter than this.1 However, the diameter of a component will not generally be known before the boundaries of the component have been identified, and so such an algorithm must search all paths up to the maximum level of n— 1 in the search for components. • « « isolated points Figure 6.1 Components in a network Because such a procedure is very time-consuming and inefficient, it is not practicable for most computing purposes. For this reason, social network packages generally use an alternative procedure. Components are discovered by building up 'spanning trees', using a back-tracking method from chosen points. The algorithm looks for any point that is connected to a starting point, and it then looks for any point that is connected to this additional point. This is repeated until no further connections can be found. The algorithm then back tracks along the chain that it has discovered until it is able to make a connection to a new point. It continues in the same way until it again comes to a halt. By repeated back-tracking of this kind, the boundaries of a component are discovered very efficiently and the procedure can search the remaining points for other components. Components can be searched for in both undirected and directed graphs, but there are important differences between the two situations. In the case of directed graphs, two distinct types of component can be identified: 'strong components' and 'weak components'. A strong component is one in which the lines that make up the paths are aligned in a continuous chain without any change of direction. Any paths that do not meet this criterion are disregarded. The justification for this restriction is that the direction of a line is assumed to indicate the possible flow of some resource 104 Social network analysis or facility, such as money, power, or information. It is only when the lines in a path run in a constant direction that this flow can continue without interruption. A strong component, then, represents a set of agents among whom such resources can easily and freely flow. An alternative, weaker interpretation can also be placed on directed lines. It can be assumed that the mere presence of a relationship, regardless of its direction, allows some possibility for communication. From this point of view, components can be identified from the semi-paths in the graph. Components in a directed graph that are identified in this way, disregarding the direction of the lines that make up the paths and taking account simply of the presence or absence of a connection, are termed weak components. The distinction between strong and weak components does not, of course, exist in undirected graphs. In these situations the researcher is dealing with what might be called 'simple components': as no directions are attached to the lines, all paths constitute acceptable connections. Computer algorithms for identifying simple components in an undirected graph are, in principle, identical to those for identifying weak components in a directed graph. It is only when the question of direction has to be explicitly dealt with that the algorithms differ. The result of a component analysis is a view of the graph as composed of one or more components (simple, weak or strong components) and a number of isolated points. Dense graphs are likely to show the dominance of a single large component, especially where the analysis is concerned with simple or weak components. In order to achieve a more fine-grained analysis, it is generally necessary to attempt to probe the internal structure of components. Everett has proposed an extension of the component idea that aims to achieve such a fine-grained view of the texture of dense networks. His approach (1982, 1983a, b, 1984) is based on a graph theoretical concept that he terms the 'block'. There is a great deal of confusion over the word 'block', as it has been used in a number of radically different ways in social network analysis. To try to avoid some of this confusion, I propose to make some terminological innovations. For reasons that will soon become apparent, I shall refer to Everett's concept not as the 'block', but as the 'cyclic component'.2 The concept of the cyclic component depends on that of the cycle. A cycle is a path that returns to its own starting point, and, like a path, a cycle can be of any length. The cycles in a graph can be described by their length as 3-cycles, 4-cycles and so on. Putting this in its most general form, graph theorists can identify what Components, cores and cliques 105 (i) (H) A B A B E F G f-* ©■-< ►-i > < >—.-( i-< )-o »-o DC C D H i J ■ Figure 6.2 Cyclic components I I. Everett terms ^-cycles, where k is any specified cycle length. A ]: useful first step in the analysis of cycles is to decide on a maximum ( cycle length for consideration. Any cycle of greater length than this j is ignored. If a maximum cycle length of 4 is chosen, for example, sociogram (i) in Figure 6.2 contains four cycles of length 4 (ABCDA, BCDAB, CDABC and DABCD) and six cycles of length 3 (ABDA, BDAB, DABD, BCDB, CDBC and DBCD).3 At a maximum cycle length of 3, only the shorter cycles remain and points A and C are not connected by any cycle. Everett goes on to define a bridge as a line that does not itself lie on a cycle but that may connect two or more cycles.4 Sociogram (ii) in Figure 6.2, for example, contains, at maximum cycle length 4, the bridge BE. A cyclic component can be defined as a set of intersecting cycles connected by those lines or points that they have in common. The separate cyclic components of a graph, therefore, do not overlap with one another, though they may be connected by one or more bridges. Sociogram (ii) in Figure 6.2, for example, is not itself a cyclic component, it does, however, contain the cyclic components {A,B,C,D} and {E,F,G,H,I,J}. The latter set of points contains the line FI, which is common to the cycles EFIHE and FGJIF. It can be seen, therefore, that a cyclic component consists of a chain of intersecting cycles, where the intersections are lines or points common to the overlapping cycles.5 The cyclic components of a graph are identified by removing from a graph all those lines that are bridges at the specified cycle length (termed the '^-bridges'). The sets of points that remain are the cyclic components. Where an analysis of simple, weak or strong components results simply in the identification of components and isolates, an analysis of cyclic components generally produces more complex results. This is because the cyclic components will be connected to various points that are not themselves members of cyclic components. Everett (1982) has shown that the connected elements will fall into one of five categories: 106 Social network analysis 1 Cyclic components. 2 Hangers. These are points that are connected to a member of a cyclic component, but which do not themselves lie on a cycle. Hangers simply 'hang' on to a cyclic component. 3 Bridgers. The points that are 'intermediaries' or 'waverers' between two or more cyclic components, but which are not members of any of them. A bridger, then, 'hangs' on to two or more cyclic components. 4 Isolated trees. These are chains of points (including dyads) that are not connected to any cyclic component. The members of these 'trees' are linked to one another in a non-cyclic way.6 5 Isolates. Those points that have no connections at all, i.e., those which have a degree of 0. It can sometimes be difficult to give a substantive sociological interpretation to long paths of connection. This is a particular problem where long cycles tie large numbers of points together. There is, for example, a tendency for connected graphs to comprise a single, large cyclic component. Everett holds that, for most purposes, it is realistic to limit an analysis to relatively short cycles of length 3 or 4. At cycle length 3, for example, an analysis would be concerned simply with cyclic components built out of triads, to which a number of substantive interpretations can be given. At cycle length 4, an analysis would be concerned with those cyclic components that are built from either triads or 'rectangles'. If a researcher intends to use cycle lengths greater than 4, it is particularly important that the substantive sociological interpretation that is to be given to the mathematical structures should be both clear and meaningful. An analysis of cyclic components can also be undertaken for directed graphs. The simplest way of doing this would be to disregard the directions that are attached to the lines. Such an analysis, based on the semi-paths in the graph, would identify 'semi-cycles'. These are cycles in which no account is taken of the direction of the lines. This does, of course, involve some loss of information, but the procedure allows the identification of what can be termed weak cyclic components. In order to analyse strong cyclic components, the information on directionality must be retained. Everett has recommended that mis kind of analysis should, in fact, also include some of the semi-cycles. In a directed cycle the direction runs consistently through all the constituent lines. In Figure 6.3, for example, ABCA is a directed cycle. Trie path ABDA, on the other hand, involves a reversal of direction between A and D, and so is merely a semi-cycle. Everett defines a semi-cycle Components, cores and cliques 107 D C -Figure 6.3 Cycles and semi-cycles as being an 'acceptable semi-cycle' if points that do not lie on a directed cycle are, nevertheless, connected by two or more distinct directed paths. Thus, points A and D are not connected through a directed cycle, but they are connected through the directed paths ABD and AD. For this reason, ABDA is an acceptable semi-cycle. I In the identification of strong cyclic components, therefore, a ■t computer algorithm must search for both the directed cycles and the acceptable semi-cycles of a graph. Using this procedure, all the cycles in a directed graph will be identifiable as directed, acceptable, or unacceptable, and an analysis of strong components would take account only of cycles of the first two types. Using these cycles alone, the strong cyclic components of the graph can be identified, and it will also be possible to distinguish between 'hangers-on' and 'hangers-off, according to the direction of the lines that connect them. The hangers-on are those hangers that direct a line towards a member of a strong cyclic component, while the hangers-off are those hangers to whom a member of the component directs a line.7 An alternative way to probe the internal structure of components -i is to see whether there are particular points that have a pivotal significance in holding components together. Hage and Harary ■ (1983) have approached this, like Everett, through a concept that they designate as a 'block'. In their case, however, this term refers to those sub-graphs within simple components (or within the weak components of a directed graph) that have no 'cut-point'.8 A cut-point is one whose removal would increase the number of components by dividing the sub-graph into two or more separate .}■ sub-sets between which there are no connections. In the graph I component (i) shown in Figure 6.4, for example, point B is a cut-{ point, as its removal would create the two disconnected components ] p::: shown in sociogram (ii). None of the other points is a cut-point. 108 Social network analysis Thus, cut-points are pivotal points of articulation between the elements that make up a component. These elements, together with their cut-points, are what Hage and Harary described as the 'blocks*; Once again, I wish to avoid the conceptual confusion which results from the varying usages given to the word 'block' and so, in what follows, I shall use the more descriptive term 'knot'. The component in Figure 6.4, then, comprises the two knots {A,B,C} and {B,D,E,F}. The various cut-points in a graph, therefore, will be members of a number of knots, with the cut-points being the points of overlap between the knots.9 («) A C F F Figure 6.4 Knots and cut-points It is relatively easy to give a substantive sociological interpretation to the idea of a cut-point. It can, for example, be seen as indicating some kind of local centrality for the corresponding agent. Hage and Harary (1983) have argued that knots ('blocks' in their terminology) can be seen as being, for example, the most effective systems of communication or exchange in a network (see also Hage and Harary, 1991 and 1998 for applications of this view). Because they contain no cut-points, acts of communication and exchange among the members of a knot are not dependent upon any one member. There are always alternative paths of communication between all the points in a knot, and so the network that it forms is both flexible and unstratified. The Contours of Components I have looked so far at procedures for the identification of various kinds of components, and I have reviewed some proposals for Components, cores and cliques 109 analysing the elements that make up these components (the knots and cut-points) and those which lie outside the components (the hangers, bridgers, trees and isolates). In this and in the following section I will pursue the question of the internal structure of components more systematically. In this section I will assess how the 'contours' of components can be charted by identifying their 'cores', and in the following section I will look at the 'cliques' and /'circles' from which components are built. ■ I showed in Chapter 2 that the work of the Yankee City researchers involved an attempt to identify the core and peripheral rnembers of what they called 'cliques'. This procedure can more usefully be applied to the internal structure of components. The ■contours of components can be disclosed by a procedure that is usually termed the 'nesting' of components, and that was briefly discussed in Chapter 3.10 Nesting successive analyses of components involves using progressively stronger cut-off criteria for drawing the boundaries of components at each step of the analysis. When combined into a single picture, the result of such a procedure is a series of concentric bounded sets of points. The basic image in a nested analysis is that of a contour map or of a set of Russian dolls, each component being 'nested' within a larger component. A component is visualized as having a core of especially cohesive or intensely connected points, with the boundaries of the core being gradually extended to include more and more points as the cut-off level of cohesion or intensity is weakened. At the weakest level of connection, all connected points are included in a single component. Figure 6.5 illustrates a simple case of nesting. The points in set A are the most tightly connected, and they comprise the core of their component. The boundary of set B is drawn with a weaker criterion Figure 6.5 Nested components 110 Social network analysis of connection and so includes all the points of set A together with the additional points that are connected at this weaker level. Finally: set C has its boundary determined by the very weakest criterion of connectedness and so includes all connected points. Sets D, E and V in the second component can be interpreted in the same way. Thus, each of the components in a graph can be de-composed into its core elements and a contour diagram of the graph can be drawn. Component detection algorithms treat all connections as binary data, as indicating simply the presence or absence of a relation, A valued graph, therefore, must be analysed by converting its actual values into binary, 1 or 0, values. This is done by comparing entries in the matrix for the valued graph with a 'slicing' or 'dichotomizing' threshold.51 Entries above or below the specified threshold are dichotomized into binary values: those above it are given the value 1, and those below it are given the value 0. These binary values can then be used in the search for components. A valued adjacency matrix might, for example, contain entries that show the multi^ plicities of the lines, and this matrix could be 'sliced' by choosing progressively stronger levels of intensity. By studying the components that are identified at each threshold level, the researcher can construct a contour diagram of nested components such as that shown in Figure 6.5. The boundaries of the components are drawn as concentric loops, and the diagram shows the 'peaks' of high intensity and the 'plains' of low intensity. Two alternative methods of nesting have been proposed: one based on the use of the degrees of the points as a measure of cohesion, and the other based on the use of the multiplicities of the lines as a measure of intensity. The degree-based measure results in the identification of '^-cores', while the multiplicity-based measure results in the identification of 'm-cores'.12 Seidman (1983) has proposed that the structure of components can be studied by using a criterion of minimum degree to identify areas of high and low cohesion. An analysis of the resulting jt-core structure of a graph, he argues, is an essential complement to the measurement of density, which I have shown fails to grasp many of the global features of graph structure. A A>core is a maximal subgraph in which each point is adjacent to at least k other points: all the points within the i-core have a degree greater than or equal to k.n Thus, a simple component is a 'l£-core'. All its points are connected to one another and so have a degree of at least 1. To identify a 2fc-core, all points with degree 1 are ignored and the structure of connections among the remaining points is examined. The 2&-core consists of those remaining connected points that have a degree of 2. A 3&-core is identified by deleting all points with a Components, cores and cliques 111 Figure 6.6 A 3k-core degree of 2 or less, and so on. Figure 6.6 illustrates a 3£-core. In this sub-graph, all points have a degree of at least 3. Although there are two points with degree 4 (points B and J), there would be no 4/c-core in this graph, as a k-core must have at least k + I members. A jt-core, then, is an area of relatively high cohesion within the whole graph. But it is not necessarily a maximally cohesive subgraph - there may be areas of very high cohesion that are connected to one another rather loosely. In Figure 6,6, for example, the cohesive areas |E,F,G,HJ,K,L} and {A,B,C,D,E,1J are connected through the weaker links CE and U. scores, then, constitute areas of the component within which cohesive sub-groups, if they exist, will be found.14 Seidman also shows how the overall fragmentation of a network can be assessed by looking at what he calls the core collapse sequence. The points in a A-core can be divided into two sets: those that are in a k+\ core and those that are not. The latter group Seidman terms the fc-remainder. The remainder in any core comprises those points that will 'disappear' from the analysis when k is increased by 1. It is the disappearance of these less well-connected points that causes the core to 'collapse' as k is increased. Seidman proposes that the proportion of points that disappear from a core at each increase in k can be arranged in a vector (a simple row of figures) that describes the structure of local density within the component.15 112 Social network analysis A Value of k Remainder Remainder as proportion 0 0 0 1 2 0.3 2 0 0 3 4 0.6 4 No points left F Figure 6.7 Collapse of a k-core This can be illustrated through the sociogram in Figure 6.7. All six points are connected, and so the increase in it from 0 to 1 involves no loss of points. At k = I all points are contained in a core, but there is a remainder of 2 (points A and F) that will disappear when k is increased to 2. At k = 2, points B, C, D and E remain, each with a degree greater than or equal to 2. As these points are, in fact, mutually connected at degree 3, there is no remainder at k = 2. When k is increased to 3, however, the remainder is 4, as all points will disappear when k is increased to four. Arranging the sequence of remainders from k = 0 in a vector gives the following core collapse sequence: (0, 0.3, 0, 0.6). The core collapse sequence gives a summary of the 'dumpiness' of the component. A slow and gradual collapse in the core, argues Seidman, indicates an overall uniformity in the texture of the network. An irregular sequence of values, as shown in Figure 6.6, shows that there are relatively dense areas surrounded by more peripheral points. The persistence of zero values in the vector up to high levels of k indicates a uniformity of structure within the component; the appearance and persistence of zero values after low levels of k indicates the existence of clumps of high density. By contrast with scores, which are based around the degrees of the points, 'm-cores' are based around the multiplicities of the lines. The notion of an m-core describes the original nested components discussed by the gradap group.16 An m-core can be defined as a Components, cores and cliques 113 -4 — maximal sub-graph in which each line has a multiplicity greater than or equal to m. An m-core is a chain of points connected by lines of the specified multiplicity. As in the case of a fc-core, a simple component is a 1 m-core, as all of its points are connected with a multiplicity of at least 1. In a 2m-core, lines of multiplicity 1 are ignored, and components are identified on the remaining lines. In a 3m-core, lines of multiplicity 1 and 2 are ignored, and so on. Figure 6.8 shows a simple 3/n-core. All the points are connected through paths of multiplicity greater than or equal to 3, the weaker connections of the points to those outside the core being disregarded. As points B and C are connected by a line of multiplicity 4, they form a two-member 4m-core. It is the nesting of cores within one another that discloses the overall shape of the network.17 A D Figure 6.8 A 3m-core Seidman's idea of the core collapse sequence can be extended to m-cores: indeed, the idea is far simpler to apply to them. This can be illustrated with the sociogram in Figure 6.9. Lines are progressively removed as the value of m is increased, and the remainder at each level of m is the number of points that will disappear when m is increased to m+1. Two points disappear when m is increased from 1 to 2, but no further points disappear until m reaches 4. If m is increased to 5, all points will disappear, as the highest multiplicity in the graph is 4. Thus, the m-core collapse sequence for this component is: (0, 0.28, 0, 0.28, 0.43). To complete this section it is necessary to consider the analysis of nesting in relation to cyclic components. Cyclic components can, of course, be identified in valued graphs by using an appropriate 'slicing' value. By varying the slicing criterion it is possible to arrive at an analysis of nested components - in this case, of nested cyclic components.18 Taken together, the various extensions of the basic idea of the simple component provide a powerful set of concepts for analysing the level of fragmentation in a network. They supplement the 114 Social network analysis Components, cores and cliques 115 Value of m m-cores Remainder Remainders as proportion 0 {A, B, C, D, E, F, G} 0 0 1 {A, 8. C, D, E, F, G} 2 0.28 2 {A, B, C, D, E) 0 0 3 {A, B, C,0, E ) 2 0.28 4 {A, B, C ) 3 0,43 5 No points left Figure 6.9 Collapse of an m-core measurement of density and help to overcome many of its limitations by highlighting the overall shape of the network. A full outline comparison of the global structures of networks of comparable size would involve measures of the overall density of the networks and their inclusiveness, the number and sizes of their components and their densities, and the nested structures of the components and their core collapse sequences. Cliques and their Intersections The concepts discussed so far in this chapter have gone some way towards formalizing the ideas of those early writers on social networks who talked about the 'cliques' discovered in the Hawthorne works and in Yankee City. But I have not yet considered the sociometric concept of the clique itself, which has arisen in discussions of the sociological applications of graph theory. There are a number of competing usages of the word 'clique', but the most widely held view is that its essential meaning is that of the 'maximal complete sub-graph' (Harary, 1969; Luce and Perry, 1949). That is to say, a clique is a sub-set of points in which every possible pair of points is directly connected by a line and the clique is not contained in any other clique.59 As Figure 6.10 shows, a three member clique Figure 6.10 Cliques of varying sizes contains three lines, a tour member clique contains six lines, a five member clique has ten lines, and so on.20 While a 'component' is maximal and connected (all points are connected to one another through paths), a 'clique' is maximal and complete (all points are adjacent to one another). Doreian (1979: 51-2) has spelled out some of the formal properties of cliques. The basic consideration is that all cliques are maximal sub-sets of points in which each point is in a direct and reciprocal relation with all others. In an undirected graph all lines are, by definition, reciprocal relations, and so a clique detection procedure will consider all the lines in the graph. In directed graphs, however, this is not the case: its matrix is asymmetrical, and only the reciprocated lines should be considered. In directed graphs, therefore, network analysis identifies what might be called strong cliques. On the other hand, if the direction of the lines is disregarded and simply the presence or absence of a relation is considered, the analysis treats all lines as if they were reciprocated and results in the identification of weak cliques.25 This concept of the maximal complete sub-graph is rather restrictive for real social networks, as such tightly knit groups are very uncommon. For this reason, a number of extensions to the basic idea have been proposed.22 The earliest of these extensions was the concept of the n-clique, which, it was claimed, is much closer to people's everyday understanding of the word 'clique'. In this concept, n is the maximum path length at which members of the clique will be regarded as connected. Thus, a 1-clique is the maximal complete sub-graph itself, the set in which all pairs of points are directly connected at distance 1. A 2-clique, on the other hand, is one in which the members are connected either directly (at distance 1) or indirectly through a common neighbour (distance 2). 116 Social network analysis 1 -clique 2-clique 3-clique Figure 6.11 n-ciiques of size 4 The value of n which is to be used in an analysis is chosen by the researcher, and a progressive increase in the value of n results in a gradual relaxation of the criterion for clique membership (see Figure 6.II). A 3-clique, for example, is a looser grouping than a 2-clique. The maximum value that can be given to n is one less than the total number of points in the graph. In practice, however, most large connected graphs are joined into a single n-clique at much shorter path lengths than this. A/-cliques can be identified through the relatively simple matrix multiplication methods that are available in many spreadsheet programs or in specialist network analysis programs. Multiplying the adjacency matrix by itself, for example, produces a matrix of path distances. The square of the matrix shows all distance 2 connections, the cube of the matrix shows distance 3 connections, and so on. Matrix multiplication is, however, a rather inefficient method of clique detection, and most specialist network analysis programs use a variant of the back-tracking procedure used for component detection. Because of the ease with which this can be done, clique detection for undirected graphs is a feature that is built into most social network analysis programs.23 It is possible to analyse n-cliques in a valued graph by applying a slicing criterion of the same kind as was discussed in the previous section. Such an analysis would generate a set of nested cliques for each level of n: nested 2-cliques, nested 3-cliques and so on. There are two important limitations on the use of the n-clique idea. The first and most important is that values of n greater than 2 can be difficult to interpret sociologically. Distance 2 relations can be straightforwardly interpreted as those which involve a common neighbour who, for example, may act as an intermediary or a broker. Path lengths greater than 2, however, involve rather more distant and weak links. While long, weak chains of connection may be very important for the overall structure of the network, as Granovetter and the 'small world' analysts have argued, it is not at Components, cores and cliques 117 all clear that they are appropriate for the definition of cliques. The ! very idea of a clique seems to demand relatively close linkages. It is, i therefore, difficult to justify the identification of n-cliques with I values of n other than 1 or 2. | 1 A F Figure 6.12 Sub-graphs and 2-cliques The second limitation on the use of the n-clique concept is the fact that intermediary points on the paths of the H-clique may not themselves be members of the clique. For example, points A, B, C, D and E in graph (i) of Figure 6.12 form a 2-clique, but the distance 2 path that connects D and E runs through the non-member F. The 'diameter' of the clique - the path distance between its most distant members - may, then, be greater than the value of n that is used to define the clique. Thus, the set {A,B,C,D,E} comprises a 2-clique, but it has a diameter of 3. Both Alba (1973, J 982) and Mokken (1974) have taken up this problem and proposed some further extensions to the idea of the n-ciique. Mokken has proposed that a more useful concept is one that would limit the diameter of the n-clique to n. That is to say, the researcher accepts, for example, distance 2 paths for the identification of clique members, but also requires that the diameter of the clique be no greater than 2. This concept he terms the n-clan. Graphs (ii) and (iii) in Figure 6.12 are, unlike graph (i), 2-clans.24 A different extension of the basic clique idea is that of the £-plex, proposed by Seidman and Foster (1978). Whereas the concept of the n-clique involves increasing the permissible path lengths that 118 Social network analysis (ü) E F Figure 6.13 A 3-clique and a 3-plex define the clique, the concept of the A>plex involves reducing the: number of other points to which each point must be connected. Thus, the points in a £-plex are connected at distance 1, but not ail points will be connected to one another. A A-plex is a set of points in which each point is adjacent to all except k of the other points,25 Thus, if k= 1, a I-plex is equivalent to a I-clique, and so it is a maximal complete sub-graph. Each member of the l-plex is connected to n- 1 other points. When k is equal to 2, all members in the 2-plex are connected to at least n - 2 of the other members, but the 2- plex may not be a 2-clique. In Figure 6.13, graph (i) is a 3-clique, as all pairs of points are connected at distance 3 or less. It is not, however, a 3-plex, as A, C, E and F are each connected to fewer than three other members. Graph (ii) is both a 3-clique and a 3- plex.26 An important consideration in the analysis of fc-plexes is that of the minimum size which the researcher will regard as acceptable for a plex. In particular, higher values of k ought to lead to a higher cutoff threshold for the size of acceptable £-plexes. When k takes a low value, /c-plexes can be relatively small, but higher levels of k will produce trivial results unless the minimum size of the acceptable k-plexes is increased. The reason for this is that small sub-graphs at high levels of k will be only minimally cohesive. As a rule of thumb, the minimum size for an acceptable £-plex should be k+2. Nevertheless, the concept of the fc-plex, considered as a generalization of the basic clique idea, seems to grasp more of the idea of cohesion than does the n-clique, especially when values of n higher than 2 are used.27 As in the case of n-cliques and components, the basic idea of Components, cores and cliques 119 a it-plex can be extended to valued graphs by using a slicing criterion to analyse 'nested fc-plexes'. In any but the smallest graphs, there will be a considerable amount of overlap among the various n-cliques and fc-plexes of which the graph is composed. Clique-analyses (of both n-cliques and A:-plexes) will tend to produce long lists of overlapping cliques, and these results may be difficult to interpret. A relatively dense network will tend to comprise a large number of overlapping cliques, with many points being members of numerous different cliques. A graph with 20 points and a high density, for example, could contain approaching 2000 overlapping cliques. In these circumstances, the density of the overlap among cliques may be more significant than the composition of the cliques themselves. Alba (1982) has, therefore, proposed that social network analysts should use concepts that explicitly recognize this fact of overlap. Drawing on work undertaken with Kadushin and Moore (Alba and Kadushin, 1976; Alba and Moore, 1978; Kadushin, 1966, 1968), he argued that the concept of the 'social circle* can be used to grasp significant structural features of social networks. This idea was devised by Kadushin from the initial insights of Simmel (1908), who first outlined the importance of the 'intersection of social circles'. The cohesion of a social circle is not founded on the direct 'face-to-face* contacts of its members, but on the existence of short chains of indirect connections that weld them together. Circles 'emerge' from interaction and may not be visible to their participants, as their boundaries are only loosely defined by the ramification of these indirect connections. Alba's contribution was to formalize the idea of the circle in sociometric terms by relating it to other graph theoretical concepts. His basic argument is that overlapping cliques can be aggregated into circles if they have more than a certain proportion of their members in common. Alba suggests that the most appropriate procedure is to use a kind of 'snowballing' method in which cliques are aggregated into progressively larger, and looser, circles. The first step in an analysis of circles is to identify 1-cliques of size 3 (triads) and then to merge into a circle all of those cliques that differ by only one member. Put in a slightly different way, the criterion for identifying circles in the first step is that cliques are merged into a circle if two-thirds of their members are identical. The result of this first step, then, might be one or more circles, together with a number of separate cliques and isolated points. At the second step the remaining cliques might be merged with those circles with which there is a lower level of overlap. Alba suggests 120 Social network analysis 1-cliques: {A,B,C} {B,C,D} {B.D.E} {B.F.GJ |B,G,E} tst step circles: {A,B,C,D,E} {B,F,G,E} 2nd step circles: (A,B,C,D,E,F,G) Figure 6.14 Intersecting social circles that a one-third overlap in membership might be appropriate in this second step. The result of this aggregation will be a large circle or a set of smaller circles surrounded by a periphery of less well-connected cliques and points. Figure 6.14 shows a simplified analysis of social circles. Two circles are identified at step 1, but they are merged into a single circle at step 2. As in so many graph theoretical procedures, it is important to note that the level of overlap that is chosen for aggregation is arbitrary. The levels suggested by Alba were chosen on common-sense mathematical grounds, and it will be necessary for researchers to decide whether his suggestions make sense in specific applications. The measurement of circles, therefore, takes the extent of the overlap between cliques as a measure of the distance between them. The particular way in which the cliques have been identified (as n-cliques or fr-plexes, for example) hardly matters in this procedure, as the subtle differences in the procedures rapidly disappear during the process of aggregation. In practice, the end result of an aggregation into circles is barely affected by the initial clique detection method that is used.211 Components, cores and cliques 121 Components and Citation Circles The sociology of science is one of the principle research areas in which a number of studies have invoked the idea of the social network. Crane's study (1972) of the 'invisible college' was one of the earliest pieces of research to use the idea of networks of < \ communication among scientists as a way of explaining the growth 1 of scientific knowledge. Crane's study involved the use of question- - f naires to obtain information on patterns of communication and \ influence among rural sociologists, and she analysed such pheno-: mena as co-publication and advice on areas of research specialization. Her concern was to outline the size and significance of the '. 1 invisible college of collaborators in the research specialism, but few * | sociometric concepts were used to uncover its internal structure. ' { Mullins (1973) adopted a different strategy. He looked at work in •: theoretical sociology and tried to discover the sub-groups of specialists that existed. Using material on education and career appointments as well as co-publication, he constructed sociograms for structural functionalist theory, small group theory, causal theory and a number of other areas.29 Unfortunately, the boundaries of the research specialisms were not themselves derived from sociometric analyses, and so Mullins's work gives little idea of the overall 1 _ structure of components and cliques in theoretical sociology. Gattrell's work, however, is one of the few studies in this area to have adopted a rigorous sociometric approach to the discovery of network structure. Gattrell (1984a, b) has used the techniques of Q-analysis (see n. 17 above) to disclose the structure of components in research groups. It is unnecessary to discuss the details of this complex procedure, as Gattrell used it simply in order to construct a nested model of components, and his ideas can readily be translated into the terminology of this chapter.30 Gattrell identified a set of geographical papers published between 1960 and 1978, which he regarded as the key elements in the literature on spatial modelling. Taking these papers as his population for study, he constructed a network of citation relations from their bibliographies and footnotes. Where the author of paper A cites the author of paper B, for example, a citation relation is directed from A to B. These citations can, therefore, be compiled into a binary matrix of directed lines. As the rows and columns of the matrix were ordered chronologically, by the date of publication, it was easy to assess any obvious shifts in citation patterns. If, for example, authors cited only relatively recent 1 papers, the T entries in the matrix would lie close to the diagonal. The more scattered are the ' 1' entries, the more widespread in time are citations. Any clustering around the diagonal would show 122 Social network analysis support for Price's hypothesis (1965) of the 'immediacy effect' in citation, but Gattrell found little support for this idea. The main aim of Gattrell's paper was to examine the component structure of the citation data, and from his initial matrix he compiled two analyses. First, he analysed the structure of the network of papers cited (the rows), and, second, he analysed the structure of the citing papers (the columns). Two cited papers are regarded as being connected to one another if they are each cited in the same source, and a component comprises a set of papers that are con^ nected through a continuous chain of such connections,31 Where two cited papers have more than one of their citers in commons they are connected at a higher level of multiplicity, and it is possible to investigate the nesting of components at various levels of multiplicity. Gattrell found that, at the lowest level, 49 of the papers were formed into a single large component. But at a multiplicity level of 6, this had shrunk to seven members. The seven papers in this component formed the core of the network. At the heart of this group were two highly cited papers by Hudson (1969) and Pederseri (1970). Hudson received 17 citations and Pedersen received 15 citations, but only eight of their citations were common to one another. Thus, Hudson and Pedersen formed a component of size 2 at multiplicity 8 (calculated from Gattrell, 1984b: 447). Gattrell concludes that: The general picture ... is of a small group of highly cited papers, to which other literature is connected at lower . . . [multiplicity] levels. A small component of papers concerned with hierarchical diffusion emerges, and other papers are added to this nucleus as a result of their being cited by some of the sources that cite the seminal papers, (Gattrell, 1984b: 448) The analysis of components and their cores, then, allows the investigation of the structure of influence in scientific research, such investigations pointing to the important role played by scientific cliques and circles in the promotion of particular ideas and approaches. The analysis of nested components in citation patterns highlights the 'star' cited papers and the extent to which there is any consensus over their star rating. Positions, Roles and Clusters The network concepts that have been discussed so far in this book have mainly been concerned with the particular patterns of direct and indirect contacts that agents are able to maintain with one another. They have been concerned with such things as the abilities of agents to join with one another in cohesive social groupings, their abilities to influence the actions of those particular others to whom they are connected, and so on. However, I have, at a number of points, alluded to the analysis of 'positions' rather than individual agents and their connections. Warner and Lunt (1942), for example, attempted to investigate the formation of distinct social positions, and Nadel (1957) argued that social roles were the central elements in social network analysis. The key concept in recent discussions of this problem is the idea of 'structural equivalence'. This involves a concern for the general types of social relations that are maintained by particular categories of agents. While two people may have direct connections to totally different individuals, the type of relations that they have with these others may, nevertheless, be similar. Two fathers, for example, will have different sets of children to whom they relate, but they might be expected to behave, in certain respects, in similar 'fatherly' ways towards them. The two men, that is to say, are 'structurally equivalent' to one another. They occupy the same social position - that of 'father' - and so are interchangeable so far as the sociological analysis of fathers is concerned. The idea behind structural equivalence, therefore, is that of identifying those uniformities of action that define social positions. Once the positions have been identified, the networks of relations that exist between the positions can be explored. Social positions are occupied by agents who are 'substitutable' one for another, with respect to their relational ties (Burt, 1982; Sailer, 1978). They are, in certain important respects, interchangeable. Although social positions are manifest only in the particular relations that link specific agents, they cannot be reduced to these concrete connections. They involve more enduring relations that are reproduced over time. These enduring relations among social positions constitute a distinct area of structural analysis.