Chena River Waterfront: Section 5.1 Delimiting the Scale of Behavioral Inferences

5.1 DELIMITING THE SCALE OF BEHAVIORAL INFERENCES: MOVING FROM FREQUENCIES OF CLASSIFIED ITEMS TO INFERENCES ABOUT THE PAST¹

Gould, Russell T. (1998) 5.1 Delimiting the Scale of Behavioral Inferences: Moving from Frequencies of Classified Items to Inferences about the Past. In Historical Development of the Chena River Waterfront, Fairbanks, Alaska: An Archaeological Perspective, edited and compiled by Peter M. Bowers and Brian L. Gannon, CD-ROM. Alaska Department of Transportation and Public Facilities, Fairbanks.

The Barnette Street excavations yielded a large number of artifacts for which many different types of observations have been recorded. For example, information exists on the manufacturer of a particular item, its state or condition (whole or fragmentary), its physical dimensions (weight, length, thickness, etc.), or the type of material from which it was made (such as leather, wood, metal, woven cloth, etc.). Information on an object’s context of recovery also is recorded—excavation area, unit, and level, along with associated items and their individual attributes. Because of the substantial number of observed attributes and ways in which observations could be aggregated, there are numerous schemes that could be presented in order to comment on life in early 20th century Fairbanks. This does not suggest that meaningful analysis is merely a series of meandering comparisons. Focus to analytical endeavors stems from the fact that one of archaeology’s primary research goals is to determine how information on both objects and their recovery contexts can provide insights into past behavior. This is not a problem presented only by the Barnette Project, nor an issue found only in Alaskan archaeology. It is a global problem of archaeology. In addressing the global problem, the debris from early 20th century Fairbanks yields valuable insights into one of the fundamental issues of archaeological inquiry—how it is possible to move from contemporary observations of the material record in order to make statements about past behavior.

Basic to this theoretical concern is not only how it may be possible to use archaeological data in making statements about the past, but also how it is possible to arbitrate between conflicting or contradictory statements. Over the past 35 years, American archaeology has undergone a fundamental shift in practice, moving from a largely historically driven enterprise towards a more rigorous scientific discipline.² Along with this shift in the nature of research, there have been several significant alterations in how archaeologists make knowledge claims about the past. Most important among these is the recognition that when we excavate, we do not directly observe the past. We make observations in the present on a static material record and use these to make inferences about past behavioral dynamics. Thus, behaviorally oriented investigations contain two elements: 1. the recognition of techniques that are used to assess similarities and differences among archaeological sites, and 2. the identification of the behavioral significance that lies behind such comparative outcomes. While the former might be mistaken as only a pragmatic concern for analysts, the theoretical nature of the latter is easily discerned. Since the significance of outcomes is dependent upon a necessary linkage between dynamic behavior states and observable outcomes, both elements ultimately intertwine in the same intellectual package. Simply put, the technique, in and of itself, holds little interest without the theoretical component.

The most frequently used means by which archaeologists assess similarities and differences of site deposits and their behavioral significance is with functionally based classifications. The necessary assertion that stands behind such schemes is that similarities and differences in behavior—at some scale—result in corresponding similarities or differences in the frequencies of classified items. In Alaskan historic archaeology, two functional schemes of debris classification predominate—the system employed in the Skagway excavations³ and the other developed by Rick Sprague,⁴ commonly used throughout western North America. Though each employs different classificatory units, both are claimed to provide a means to ascertain similarities and differences in behavior. When comparing such schemes, a preliminary issue is the scale of functional differentiation. Is differentiation manifest at the level of a single episode of a specific activity (such as food preparation or vehicle repair) or do differences reflect general long-term agents of site formation (such as household, saloon, or commerce-related activities)? A critical examination of these two classificatory schemes allows us to identify current theoretical limitations and problems, an exercise that yields growth in general archaeological knowledge.

An empirical assessment of the performance criteria of both classificatory schemes relative to prior behavioral knowledge is a useful first step. In many of the areas excavated by the Barnette project, historic documents and oral interviews reveal the nature and type of activities that occurred at those locations in the past. This provides a useful baseline against which behavioral comparisons can be made. Since a modified version of the Sprague scheme was one of the observations made on items recovered by the Barnette Project (Appendix 9), the classification’s performance can be directly assessed with the Barnette data. The classification developed by Spude (née Blee)⁵ for Skagway was not applied to the Barnette data. However, it can be assessed using the data set with which it was originally developed. Most importantly, this critical assessment of both schemes allows the sharp definition of the potential relationship between past behavior and archaeological observations. While this assessment does not provide an end-all theoretical statement about this relationship, it does allow the identification of potential boundary conditions for a theory which can be further investigated. Ultimately this allows archaeologists to make advancements in the direction of theory building by focusing on the type of relationships that must exist in order to link past behaviors with observable outcomes.

While both classificatory schemes parcel potential observations into quantifiable groups, some significant, sometimes unstated, sometimes untested assumptions are required. The first subsection below identifies these assumptions and defines potential spheres of variability. This clarifies the underlying logic that must operate in order for behavioral discrimination to occur. The remaining subsections contain empirical evaluations of Spude’s approach and Sprague’s system. The concluding section identifies the current status of theoretical concerns and offers suggestions for productive research avenues.

Both approaches to assessing functional differences begin with the identification of classificatory units in relatively commonsense terms. Neither strongly argues why a particular division necessarily must result in behavioral discrimination. Spude does go farther than Sprague does by offering accommodative arguments about the occurrence of items. She constructs her classes on the basis of hypothetical frequencies of items, either high or low, from hypothetical deposits created by different groups (i.e., families, male-only households, saloons, etc.).⁶ The bulk of the evidence supplied for these connections is primarily their intuitive appeal, though she does say that they are reflective of object frequencies noted in photographs and other documentary sources.⁷ On the other hand, Sprague offers no real justification; he just states that his scheme’s greatest merit is that it works—it is pragmatic.⁸ Primarily on this basis, he asserts its utility. Neither scheme argues for a general classification-behavior connection. Rather, both strongly rely upon a notion of performance quality (that is, behavior discriminating ability) as the leading criterion for assessing merit. In this regard, Spude go to much greater lengths to provide an assessment, one that ultimately proves problematic, as is shown below.

Two important aspects of the linkage between observations and behavior must be contended with in order to assess any tool that attempts to discriminate differences in behavior. Researchers trained in population statistics recognize that any set of diagnostic observations must form a recognizable distribution with unique parameters that can be used to distinguish it from others. Within-group variability describes the range of variation seen within a single population. The variability among groups reflects the increase in variation that is recognized when a measure of dispersion is simultaneously calculated for two or more distinct populations. In order to use a set of properties to distinguish between two or more groups, within-group variability must be known or, as usually is the case, accurately estimated. This provides the ability to distinguish unique groups when considered as part of a collection of groups.

Catherine Spude’s pioneering effort marks an important goal for building a theoretical bridge between observations and past behavior. Using multiple regression, she attempts to use the proportional frequencies of 13 artifact classes to determine the different agents responsible for deposit formation. In terms of function, different agents are identified at a general scale, including individual families, saloons, hotels or restaurants, brothels, and male-only households. Her classification uses 13 classes, including food storage containers, decorated dishes, undecorated dishes, other household items, generic personal items, child-specific items, female-specific items, male-specific items, liquor-related items, bottle stoppers or caps, pharmaceutical bottles, and armaments or military items.

She does not justify the relevance of these classificatory units by arguing a general theoretical relationship between classes and behavioral units. Instead, she hypothesizes frequencies within assemblages created by the groups she wishes to identify. In constructing a model of the distributional characteristics of each group responsible for deposit formation, she uses the mean proportion of each artifact class in her multiple regression model.

When using a linear model, such as multiple regression, several distributional properties must be manifest in order for the model to make accurate projections. One of the most critical is that the values of different variables must follow a normal or Gaussian distribution, indicated by the well-known bell-shaped curve, with the mean in the center and equal tails on both sides. One of the easiest ways to assess this distributional assumption is to plot the variables in question in a histogram and inspect for these properties.

Figure 5.1 shows the proportional frequencies for liquor-related items in family household assemblages, which were the most heavily sampled group used to construct Spude’s model. Her calculation of the mean proportional value for these items is shown by a dashed line. As evident in this graph, these proportional values are not normally distributed. It is a bimodal or trimodal distribution. As such, the mean and variance say little since they do not provide an adequate summary of distributional form.

While this is an extreme case of a non-normal distribution, of the 13 classes used in Spude’s model for household assemblages, 11 of them clearly are not normally distributed (Appendix 10). As a result, the distributional requirements of the multiple regression model are not met. In and of itself, this does not mean that her model is inaccurate. It only means that it is suspect. In order to assess whether or not the mean proportional frequencies allow accurate projections of the values in samples, the goodness-of-fit can be checked.

Figure 5.1. Relative Frequency of Liquor-Related Items in Spude’s 14 Family Assemblages. The green vertical dashed line shows the value for Spude’s calculated value for central tendency of .184.

It is possible to show how well Spude's model expectations fit cases on which it was based. If the expected proportions of artifact classes are multiplied by the sample sizes of the assemblages used, the goodness-of-fit between expected and observed values can be calculated (Appendix 10). Table 5.1 shows the outcome of such tests between the expected values within her model and the assemblages on which it was based. The log-likelihood ratio statistic, G, is distributed as a c ². The probabilities (p) shown in Table 5.1 represent the likelihood that both observed and expected values were drawn from the same population. Out of 14 assemblages, only one—the Peniel Mission/Mercier Family assemblage—fits the expected values. This clearly shows that using mean proportional frequencies of these classes does not adequately characterize the assemblages in question. This renders the results of her analysis ambiguous. Her approach does not allow the discrimination between different agents of site formation since the model does not capture the distributional properties of these assemblages.

Table 5.1. Goodness-of-Fit Between Spude’s 14 Family Assemblages and Values Expected with Model Relative Frequencies.⁹

In his paper on functional classification,¹⁰ Sprague never says how assemblages of items classified using his scheme should relate to one another. He focuses on the linkage between single artifacts and the activities accomplished with them. The nature of the relationship between behaviorally meaningful observations and classified items is purely definitional. Because Sprague only redefines objects within a milieu of activities, and because he does not discuss aggregates of classified items—that is, assemblages—Spude asserts that Sprague's scheme does not work for detecting behavioral similarities and differences at the assemblage level.¹¹ Neither Spude’s nor Sprague’s scheme truly is on firmer theoretical ground than the other. Since the ultimate utility of these schemes is how well either discriminates behavior, an assessment of linkage between behavior and classificatory outcomes when using Sprague’s scheme is warranted.

In such an examination, a preliminary assumption must be made. First, within relatively short time spans and the confines of specific excavation areas, consistency in the agents of deposition formation is assumed. Divisions within spatio-temporal units are based on important changes, such as the means of transportation into Fairbanks. The switch from boat-based transportation to rail is such a shift. In addition, documentary information on the use of excavated structures figures into the grouping of related deposits. Using these data, a trial formulation of temporal and functional similarities and differences between deposits was constructed (Table 5.2). Many temporally mixed deposits were eliminated, as were those with very small sample sizes and those of uncertain functional orientation (Appendix 10). Excavation levels are grouped into general temporal and functional units. These units are subdivided into subgroups, whose members are spatially constrained within a single excavation area. The Cabin deposits in Area C are associated with the earliest occupation of Fairbanks (Features 2 and 8). Early Saloon levels consist of pre-Prohibition deposits located in three excavation areas—the California Saloon in Area C, the Miners’ Home Saloon in Area G, and the dump deposits along the bank of the Chena River in Area A. Early-Late Saloon deposits consist of temporally mixed levels from Area C. Late Saloon levels also come from Area C, dating to the post-Prohibition period when the Chena Bar was operated. Railroad (post-1923) and Pre-Railroad (pre-1923) levels largely come from excavations along the south bank of the Chena in Areas B1 and B2.

Table 5.2. Trial formulation of spatio-temporal units, showing subgroups and constituent area, levels, and sample sizes.

Unit	Subgroup	Area	Level	N
Cabin	1	C2	3a	179
	1	C2	3a/3b contact	5
	2	C2	3a1	157
			3b1	20
			3b2	23
			3b3	59
Early Saloon/California	1	A	3	4,525
			4	7,204
			4a	8,739
			4b	180
			4c	11,003
			4d	1,897
	2	C3	3b	1,273
	2	C3	4b	55
	3	C4	3b	282
			3c	830
			3c,4a,4b	92
			3d	119
			4a	2,199
			4b	5,149
			4d	87
Early Saloon/Miners' Home	1	G	3	1,676
			4	5,248
			5	565
Early-Late Saloon Mixed	1	C1	3c	3,639
	2	C2	3b	317
	3	C3	3	996
			3b, 3b.2, 3c	1,962
			3b.2	4
			3c	7,765
			3d	277
Late Saloon/Chena Bar	1	C1	3	2,516
			3 & 3b	1,193
			3b	3,266
			3d	274
	2	C2	1	568
			2	735
			3	361
	3	C4	3	425
Pre-Railroad	1	B1	none	76
			3	186
			4	48
			5	46
			6	321
	2	B2	3	428
Railroad	1	B1	2	8,129
	2	B2	2	393
	2	B2	2 & 3	212
	3	C1	3b & 3c	946
	4	C2	3c	228
Total				86,877

The Sprague classification scheme offers many different scales of functional resolution. In its most general rendition, seven primary classificatory groups are noted within the Barnette assemblage: 1-personal items, 2-domestic items, 3-architectural items, 4-personal and domestic transportation items, 5-commerce and industry related items, 6-group services items, and 8-unknown items.

The first step in the analysis of behavioral resolution when using Sprague's primary classes is a series of goodness-of-fit tests of proportional frequencies within subgroups of spatio-temporal groups (the Unit and Subgroup classifications in Table 5.2). At the primary class level, log-likelihood ratio values reveal only a few cases where similar proportions of primary classes are found (Appendix 10 provides the detailed analysis). These tests show only six cases where two levels within the same unit-subgroup clearly demonstrate equivalent proportions of items. The most common test result is one in which levels within the same unit and subgroup display significantly different proportions of Sprague primary classes.

Two possible conclusions can be drawn from the analysis of primary Sprague classes: 1. either the basis for the preliminary unit-subgroup scheme is wrong or the historical depictions of an area’s use or the assumptions about homogeneity of depositional agents, or both, are incorrect; 2. the primary Sprague classes have little value for diagnosing past behavior.

The knowledge of area use, not in a specific sense but at the level of similarity versus difference, is on firmer ground than the performance qualities of the primary Sprague classes. As a result, the primary Sprague classification is probably the source of the poor fit. In other words, the unknown performance quality of the Sprague primary classes makes it more suspect of the two possible conclusions. However, inaccuracies in the unit-subgroup assignment cannot be ruled out as a cause of extreme within-group heterogeneity just yet.

In order to determine which of the two possible conclusions is implicated, the Barnette data were reassessed at the secondary Sprague class level (Appendix 10). Secondary Sprague classes subdivide primary classes into more discrete categories. As an example, the primary class 1-personal items is subdivided into 1A-clothing items, 1B-footwear-related items, 1C-adornment items, 1D-body ritual and grooming items, 1E-medical and health items, 1G-indulgences, 1H-pastimes and recreation items, and 1J-pocket tools and accessories. With regard to some types of items, the Sprague scheme expands to six levels. However, the secondary level is the finest level at which virtually all of the Barnette items are exhaustively classed. At the tertiary level, only a limited proportion of items can be expanded, since some commonly recovered items have no finer grouping than the secondary level. This is especially true among "unknown items" (Sprague primary class 8).

In tabulating the secondary class frequencies, some unit-subgroups were eliminated because of small sample sizes. Also, the six groups of levels which demonstrated homogeneous proportions at the primary class level were combined to produce aggregate levels, boosting their sample sizes. This resulted in a data set comprising 43 units of hypothetical functional groups of aggregated levels (Table 5.3).

Table 5.3. Area-level groups used in the analysis of secondary Sprague class frequencies.

In order to assess similarities and differences within and between hypothetical functional groups, nonmetric multidimensional scaling is used (Appendix 10). The virtue of this approach is that it is based on a rank order of data. Distributional assumptions are far less restrictive than those required when using parametric techniques, such as multiple regression. Multidimensional scaling also allows the production of easily interpretable graphical output.

The first step in multidimensional scaling is the production of a table within which similarities and differences between aggregate levels are characterized by distances. The distance table produces a grid resembling a mileage chart found in a typical highway atlas. Greater differences in proportional frequencies result in larger distances, while similar proportions produce smaller values. These values are based on j which is similar to the classic 2x2 chi-square value, only it is corrected for sample size (Appendix 10).

For the distances between the level groups in Table 5.3, 43 dimensions are required for a perfect fit since that equals the number of cases of aggregated area-levels (Table 5.3). Nonmetric multidimensional scaling finds the best fit for the distances when reducing the number of dimensions. In terms of a visual presentation, three or fewer dimensions are the most intuitive. To assess the goodness-of-fit, a value called stress (s) is calculated to describe the how well the reduced dimensionality departs from the actual distances. Though far from perfect, the value of s = .196 for the three-dimensional solution is rough, but usable (Appendix 10).

Figure 5.2. Final functional group clusters. Note that most clusters exhibit minimal overlap with one another.

Figure 5.2 shows the three-dimensional solution. The clusters formed by spatio-temporal groups of aggregated levels are indicated as outlined clusters. The solution provides two critical pieces of information. It demonstrates a clear justification for the spatio-temporal groupings given in Table 5.3. It also identifies the scale of behavioral similarities and differences detectable when using secondary Sprague classes.

The first conclusion—the justification of the spatio-temporal groups—is demonstrated by the configuration of data point clusters. If spatio-temporal groups contained randomly selected level groups, a cluster would display two qualities: 1. a good deal of overlap between spatio-temporal clusters—the opposite from what is seen in Figure 5.2—and 2. large cluster volumes, since the included data points would be consist of random values for Dimensions 1, 2, and 3 in Figure 5.2. In the three-dimensional plot, spatio-temporal groups, such as Railroad, Cabin, or Early Saloon, Area G have extremely restricted volumes within the three-dimensional cube. This means that with the exception of a few outliers, there is homogeneity within groups and heterogeneity between these groups.

The second conclusion that can be drawn from the nonmetric multidimensional scaling solution is that the secondary Sprague classes do succeed in identifying some aspects of behavior. Since the Sprague scheme succeeds in delineating the common nature of spatio-temporal groups, it can be seen as being partially successful in identifying assemblages that owe their origin to common agencies of deposition. This is shown by the fact that each of the spatio-temporal groups in Figure 5.2 occupies a unique volume of the solution space. There is little overlap between clusters.

While the secondary Sprague classes succeed in identifying the common members of deposits that likely originated as a result of common activities, it does not allow the identification of generic area functions. While more than a few of the spatio-temporal units are the result of saloon activities, the fact that there is little to no overlap between their cluster volumes means that each is unique. In other words, the secondary Sprague classes monitor something different than a generic state of "saloon activities." The secondary classes do not allow the identification of generic agencies of deposition, like those that Spude attempts to define.

The source of detected differences between saloon-related deposits is not entirely clear. Since differences exist between units that are both contemporaneous and non-contemporaneous, it does not appear likely that temporal shifts are the cause. This does bring forward an intriguing question. Since the Sprague scheme is conceived as a functional scheme, and if temporal differences are not the cause of differences, then differences likely are the result of the secondary class level monitoring aggregated activities.

Spatial analytic studies in archaeology have long focused on a notion of "activity areas," within which the results of specific acts lead to the accumulation of diagnostic suites of items. At the secondary Sprague class level, significant differences of items between groups do not necessarily mean that deposits formed as a result of different generic functions. Based on the Barnette analysis, it appears that deposits that have similar generic functions are unique from one another. An observation of similarity when using secondary classes does tell something more. It attests not only to similarity in generic function, but also similarity at a much finer scale—that of suites of activities.

Since the Sprague scheme is originally centered upon a notion of a single artifact’s functional attributes, it does make sense that when considered at the assemblage level, aggregated activities are identified. This result is far from agreement with Spude’s assertion that the Sprague scheme does not work at all at the assemblage level. In fact, the analysis presented here shows that, of the two schemes assessed, Sprague’s scheme does provide an avenue for extracting behavioral information from archaeological data.

Using this information, it is possible to pose new questions that have import in further studies. Why, for example, do the primary Sprague classes fail to match expectations, yet secondary level classes seemingly provide behaviorally related observations? Further, what properties of the secondary classes allow for the assessment of similarities and differences at the level of aggregated activities, yet fail at general function? Investigation of these questions would allow researchers a better grasp of the theoretical connection between the properties of a classification scheme and behavior at different scales. This certainly would broaden the ability to use the archaeological record to make statements about past behavior.

Site	G	p
Peniel Mission/Mercier Family	13.55	0.139
Moore/Kirmse House	24.76	0.003
Mulliner Family Collection	126.27	0.000
Hamill House Dump	253.50	0.000
Feature 1 at Texas City	67.93	0.000
Feature 4 at Texas City	66.89	0.000
Locus 81-11 at Rochester Heights	178.46	0.000
Locus 81-12 at Rochester Heights	111.36	0.000
Locus 81-13 at Rochester Heights	123.55	0.000
Locus 27 at Rochester Heights	39.89	0.000
Bingham's Camp Cookhouse	61.78	0.000
Bingham's Camp Dump	72.75	0.000
Weiss Family Dump	156.75	0.000
Orchard Boss's Family Dump	363.06	0.000

Hypothetical Functional Unit	Area-Level(s)
Late Saloon/Chena Bar	C1-3
	C1-3 & 3b
	C1-3b
	C1-3d
	C2-1
	C2-2
	C2-3
	C4-3
Cabin	C2-3a, 3a/3b contact
	C2-3b1, 3b2
	C2-3a1, 3b3
Early Saloon/Miners' Home Saloon	G-3
	G-4
	G-5
Early-Late Saloon Mixed	C1-3c
	C2-3b
	C3-3
	C3-3b, 3b.2, 3c
	C3-3b.2, 3d
	C3-3c
Pre-Railroad	B1-3
	B1-4, 6
	B1-5, none
	B2-3
Railroad	B1-2
	B2-2
	B2-2 & 3
	C1-3b & 3c
	C2-3c
Early Saloon/California Saloon	A-3
	A-4
	A-4a
	A-4b
	A-4c, 4d
	C3-3b
	C3-4b
	C4-3b
	C4-3c
	C4-3c, 4a, 4b
	C4-3d
	C4-4a
	C4-4b
	C4-4d