Modeling Information Manufacturing Systems

to Determine Information Product Quality

August 1994 TDQM-94-06

Donald Ballou**

Richard Wang

Harold Pazer**

Giri Kumar.Tayi**

Total Data Quality Management (TDQM) Research Program

Room E53-320, Sloan School of Management

Massachusetts Institute of Technlogy

Cambridge, MA 02139 USA

Tel: 617-253-2656

Fax: 617-253-3321

** Management Science and Information Systems

State University of New York at Albany

Albany, New York 12222

Tel: 518-442-4925

Fax: 518-442-2568

© 1994 Donald Ballou, Richard Wang, Harold Pazer and Giri Kumar.Tayi


Acknowledgments Work reported herein has been supported, in part, by MIT's Total Data Quality Management (TDQM) Research Program, MIT's International Financial Services Research Center (IFSRC), Fujitsu Personal Systems, Inc., Bull-HN, and Advanced Research Projects Agency and USAF/Rome Laboratory under USAF Contract, F30602-93-C-0160

Modeling Information Manufacturing Systems

to Determine Information Product Quality

1. Introduction

Product quality in manufacturing systems has become increasingly important. The current emphasis on Total Quality Management (TQM) is a manifestation of this trend. Although increasing foreign competition has heightened attention to quality, quality control in manufacturing systems has a long tradition [Shewhart, 1931; Crosby, 1979; Taguchi, 1979; Deming, 1986; Juran, 1989; Figenbaum, 1991] . Quality-driven organizations continually strive to improve their products in a variety of ways. Some changes are major, others minor, but taken together over an extended period of time such changes can yield profound improvements in the product's overall quality.

As in manufacturing systems, information quality in computer-based systems is becoming increasingly critical to many organizations. The current efforts toward information highways and networked organizations underscore the importance of information quality. Organizations are relying more on the quality of the raw data and the correctness of processing activities that ultimately determine the information outputs. They would obviously prefer that their information outputs be of the highest possible quality. As with product manufacturing, however, cost must be taken into consideration. A workable goal, then, is to achieve the highest possible information quality at a reasonable cost.

1.1. Information Manufacturing Systems

Many of the concepts and procedures of product quality control can be applied to the problem of producing better quality information outputs. Use of the term information manufacturing encourages researchers and practitioners alike to seek out cross-disciplinary analogies that can facilitate the transfer of knowledge from the field of product quality to the less well-developed field of information quality. We use the term information manufacturing advisedly. For the purposes of this research, we refer to information manufacturing as the process that transforms a set of data units into information products. In addition, we refer to information manufacturing systems as information systems that produce predefined information products. We use the term information product to emphasize the fact that the information output has value and is transferred to the customer, who can be external or internal.

The systems we model have an analogy in manufacturing known as made-to-stock [Buzacott & Shanthikumar, 1993] . Made-to-stock items are typically inventoried. When inventory is exhausted, requests for such products can be readily satisfied because the materials, procedures, and processes needed for their manufacture are known in advance. In the realm of information systems, an example of a made-to-stock product is a request by a client to his or her financial advisor for a portfolio risk analysis. This would be requested on an ad hoc basis, but the data and programs needed to perform the analysis would be in place ready to be used.

In our context, a predefined data unit could be, for example, a number, a record, a file, a spreadsheet, or a report. A predefined processing activity could be an arithmetic operation over a set of primitive data units or a non-arithmetic operation such as sorting a file. An information product could be a sorted file or a corrected mailing list. This information product, in turn, can be a predefined data unit in another information manufacturing system.

Viewing information systems in the light of what is known about producing high quality manufactured goods can be very useful. An example of the potentially fruitful cross-pollination between manufacturing and information systems is the concept of critical path. Identifying the critical path is, of course, a standard activity in manufacturing. In a complex information system that typically involves convergent and divergent flows of information, if one wishes to produce a certain information output sooner, one should first concentrate on those activities lying on the critical path.

1.2. Timeliness vs. Data Quality

We consider four attributes of information products in this paper: timeliness, data quality, cost, and value. In particular, we focus on timeliness and data quality. A detailed treatment of these factors will be presented later. Below we clarify the term "data quality" and briefly justify our rationale for separating the dimension of timeliness from the other dimensions of data quality in this research.

We use the term "data quality" in a generic sense. That is, "data quality" is a place holder for whatever dimensions of data quality are relevant. If for a particular application one is interested solely in the data's completeness, then one would replace the term "data quality" wherever it appears in this work with the word completeness. Similarly, if one is concerned with both completeness and interpretability, say, then this pair would replace "data quality." Also, it should be noted we use the term data quality for intermediate data products (those that experience additional processing) and reserve the terms information quality and information product for the final product which is delivered to the customer.

Timeliness is usually considered a dimension of data quality; see, for example, [Ballou & Pazer, 1985] . However, we separate out timeliness from the other dimensions of data quality, in part, because timeliness will have a different value if the information product is generated today rather than at some time in the future (using the same input data in both cases). This is not the case with the other dimensions of data quality. Also, the need to treat timeliness separately can be better understood by considering one of the ultimate goals of this research: to permit changes to the information manufacturing system, ranging from fine tuning to reengineering, in the context of customer concerns regarding the information products. Many information products are time sensitive, and thus any efforts directed toward improving these products must explicitly factor timeliness into the analysis.

1.3. Industrial Relevancy of Information Manufacturing Systems

Data quality has attracted a significant amount of attention in the industry lately due, in part, to the trends toward systems integration, total quality management, and business process reengineering. For example, the Gartner Group, a leading computer industry information service firm, indicated that, "A vital prerequisite for business process reengineering is the ability to share data. However, Gartner Group expects most business process reengineering initiatives to fail through lack of attention to data quality." The Wall Street Journal also reported that, "Thanks to computers, huge databases brimming with information are at our fingertips, just waiting to be tapped. They can be mined to find sales prospects among existing customers; they can be analyzed to unearth costly corporate habits; they can be manipulated to divine future trends. Just one problem: Those huge databases may be full of junk. ... In a world where people are moving to total quality management, one of the critical areas is data."

Information systems that can be modeled as information manufacturing systems are noticeable in almost every industry. However, they have largely been studied from the information systems' perspective, either empirically-based with a focus on effectiveness or system-oriented with a focus on efficiency. In contrast, modeling these systems from the information manufacturing system's perspective underscores the importance of an analytically-oriented approach to the quality of information products in the spirit of product manufacturing, and encourages researchers to develop a theoretical foundation which can be used to guide future work in the measurement, analysis, and improvement of data quality.

1.4. Purpose and Scope of Paper

In this paper, we present a set of ideas, concepts, models, and procedures which form some of the basic building blocks upon which a theoretical foundation for information manufacturing systems can be built. Based on these building blocks, we present a methodology and develop software tools for determining information product attribute values. In addition, we illustrate how these basic building blocks can be used to study options in improving the information manufacturing system.

In our context, information products can be produced on a regular basis, for example standardized regular billing such as monthly credit-card statements. In some cases there are few quality problems with an information product. However, timeliness and quality may be conflicting goals. For example, the sooner credit card bills are delivered to customers, the sooner the issuing company would be reimbursed. Also, the card issuer would be able to identify non-payment problems sooner. Speeding up the production of the monthly statement, however, could compromise quality. Our work is designed to provide tools for analyzing how changing the information manufacturing system would impact tradeoffs such as this one.

As is also the case with traditional product manufacturing systems, there is a partial separation between the strategic issues of product mix and pricing and the managerial issues related to the efficient manufacture of the desired products. In both cases, those designing the requisite manufacturing systems can make important contributions to these strategic decisions by determining economic and technical feasibility of possible variants of the product mix. While many of the strategic issues relating to product mix and pricing extend well beyond the domain of production planning, an important contribution of the production sector is in transforming product specifications into the desired components of the product mix. The concepts, techniques, and procedures presented in this paper permit the designers to assess the impact of various information manufacturing scenarios on timeliness, quality, and cost attributes of the information product. If necessary, modifications to individual products and/or the product mix can be made in light of such an assessment.

Thus, this paper does not explicitly address the significant issue as to whether the information products are appropriate although it does facilitate analysis of revamped systems that produce different, presumably more appropriate, information products. It is not explicitly concerned with issues such as what kinds of data to use or what kinds of processing are required, but it does allow the designer to test out various alternatives. Also excluded in our present model is the information equivalent of the make-to-order manufacture such as ad hoc queries. If such queries are requested frequently enough, they could be included in the analysis. However, if they are that well-defined, we have in some sense a made-to-stock situation.

1.5. Differences between Product and Information Manufacturing

Although much can be gained by incorporating concepts and techniques from product manufacturing into the realm of information manufacturing, the analogies between the two fields have important limitations. These limitations arise from the nature of the raw material used in information manufacturing, namely the original or raw input data.

A significant feature of data is that, although it is used, it does not get consumed. One might think of a file or a data base as analogous to in-process inventory. Yet such inventory gets depleted, whereas stored data can be reused indefinitely. In a sense, a database is more analogous to a tool crib than to inventory. With a tool crib, tools are used and then returned; they are not consumed. However, even this analogy heightens the differences between the two kinds of manufacturing. Tools are used to produce the manufactured product and are not incorporated into the product as is the case with data from a data base. A related issue is that producing multiple copies of an information product is inexpensive, almost trivial when compared to manufactured products.

Other major dissimilarities arise from attributes that characterize input data which include accuracy, believability, completeness, consistency, and timeliness. Some of these attributes may have analogies with manufacturing raw material, but the analogies can be weak. For example, one could say that a raw material arrived just-in-time (in a timely fashion), but one would not ascribe an intrinsic property of timeliness to the raw material. Other attributes such as the believability of input data simply do not have a counterpart in manufacturing.

An elaboration on the attributes of timeliness in information and product manufacturing is useful in showing the difference between the two. Many information products are time-sensitive. The stock report or sports page of today's paper are examples. What daily newspaper would publish the scores of baseball games that occurred several days ago? Although time can cause some materials such as certain chemicals and medicines to degrade, there is usually no need to produce them quickly to prevent loss of relevance of the raw material.

1.6. Background and Related Research

Organizations are now better equipped than ever to develop systems that use raw data originating from a variety of sources. Unfortunately, most databases are not error free, and some contain a surprisingly large number of errors; see, for example, [Johnson, Leitch, & Neter, 1981] . It has long been recognized that data problems can cause computer-based systems to perform poorly. The need to ensure data quality in computer systems has been addressed by both researchers and practitioners for some time; see, for example, [Martin, 1973; Cushing, 1974] . A growing body of literature has focused on data quality: what it is, how to achieve it, and the consequences arising from inadequate data quality [Firth & Wang, 1993] . For example, information quality was listed as one of the six major dimensions for evaluating the success of information systems in the survey conducted by [Delone & McLean, 1992] . Their review of the major information systems literature found 23 measures of information quality. Some twenty dimensions of data quality were also identified by researchers at Bell Laboratories [Redman, 1992] . The role of system controls in ensuring data quality was described by [Bailey, 1983] . A model for tracking errors through a system to determine their impact on the information outputs has been developed by [Ballou & Pazer, 1985] . Procedures for achieving data quality have also been described in [Morey, 1982; Ballou & Tayi, 1989; Redman, 1992] .

The impact of deficiencies in data that affect individual's lives has also been formally examined by several researchers. Laudon determined that the records of many of those involved with the criminal justice system contain potentially damaging errors [Laudon, 1986] . The impact of errors in information on the likelihood of making correct decisions was analyzed by Ballou & Pazer [1990] . Other instances of the harmful impact of poor data quality are described in [Liepins & Uppuluri, 1990] .

Although insightful, research efforts on data quality presented in the existing literature have addressed issues from the information systems perspective, but no general mechanism has been proposed to systematically track attributes of data. Our methodology allows for the systematic tracking of timeliness, quality, and cost. This capability can be used to analyze an information manufacturing system and, based on the analysis, to experiment with various options. Beyond practical reasons, the ideas, concepts, model, and procedures proposed in this paper would be useful in providing a common set of terms and thus supporting the building of a cumulative body of research in this domain. A major outcome of the work described in this paper is a model-based approach for studying information manufacturing systems.

In the following section, we introduce the foundation of our model. This model incorporates the various components of the information manufacturing system and key system parameters including timeliness, data quality as defined by the specific application, as well as value to the customer and cost of information products. In Section 3, we use the model to provide a methodology for analyzing the impact of system modifications on information product attributes. In Section 4, this methodology is exemplified through an illustrative example. Specifically, we focus on explaining the mechanics of the proposed methodology. Next, in Section 5, we present a real-life application called the Optiserve case with a goal to demonstrate the methodology's usefulness and ease of implementation in improving an actual information manufacturing system. Toward this end, we highlight the modeling nuances needed to accommodate the realistic aspect of this case, and outline the methods for acquiring appropriate data. Summary and conclusions are given in Section 6.

2. Foundation of the Information Manufacturing Model

As previously stated, the term information manufacturing refers to a predefined set of data units which undergo predefined processing activities to produce information products for internal or external customers, or both. We postulate that each information product has an intrinsic value for a given customer, and we assume that the product's potential value to the customer may be diminished if it is untimely or of poor quality. The value of the information products can be improved by making appropriate changes to the information manufacturing system. The importance of doing this is attested to by [Hammer, 1990] . We seek to determine the key parameter values that will help to identify those changes to the system.

2.1. Modeling of Information Manufacturing Systems

To evaluate various system configurations, data units must be tracked through the various stages or steps of the information manufacturing process. Any of these steps has the potential to impact timeliness and data quality for better or worse. For example, introduction of additional quality control would enhance the quality of the data units but with a concomitant degradation in the timeliness measure. Also, improving a processing activity could result in higher levels of data quality and improved timeliness but increase the cost. The various components of the information manufacturing system are displayed in Figure 1.

Figure 1: Components of the Information manufacturing System

The data vendor block represents the various sources of input raw data. Each vendor block can be thought of as a specialized processing block, one that does not have a predecessor block. Thus, one vendor (internal or external) can potentially supply several different types of raw data. The role of the processing block is to add value by manipulating or combining appropriate data units. The data storage block models the placement of data units in files or data bases where they are available as needed for additional processing. The quality block enhances data quality so that the output stream has a higher quality level than the input stream. The customer block represents the output, or information product, of the information manufacturing system. It is used to explicitly model the customer, which is appropriate, as the ultimate judgment regarding the information products' quality is customer based.

We envision that the modeling of the information manufacturing system would take place at an appropriate level of detail. For example, the effect of a quality block could be modeled by specifying the fraction of suspect units entering and the fraction leaving. At a more detailed level, the quality block splits the incoming stream into apparently good and suspect substreams. The suspect stream is examined and undergoes corrective action as appropriate. Depending on the nature of the defects identified, the stream of suspect items could be split into additional substreams, each of which would undergo different, appropriate corrective action. Associated with each of the substreams are probability values giving the likelihood of Type I and Type II errors, which, together with knowledge of the original fraction of defectives, yields the fraction of apparently correct and suspect units arriving at the next block [Ballou & Pazer, 1982] . Information regarding how this applies in the context of information processing can be found in Ballou & Pazer [1982] and Morey [1982]. For this paper we have chosen not to model at this level of detail.

The nature of the activities performed by the quality control blocks is context dependent. This is true even for the same data quality dimension. For example, suppose that an information product is dependent upon a form with blanks filled in by various parties. A quality control check in this case could be a scan of the form by a knowledgeable individual to identify missing information before the form is forwarded for processing. Another type of completeness quality control could be a verification that all stores have reported their sales for the most recent period. An accuracy check could be a comparison of this period's and last period's results with outliers flagged for verification. (Did sales in Region X really go up by 250%?).

Figure 2 displays a simple information manufacturing system but one which captures many of the potential components and interactions. This system will be used in Sections 3 and 4 to illustrate concepts, components, and procedures developed for the information manufacturing model. In this system there are five primitive data units (DU1 - DU5) supplied by three different vendors (VB1,VB2,VB3). There are three data units (DU6, DU8, DU10) that are formed by having passed through one of the three quality blocks (QB1 - QB3). For example, DU6 represents the impact of QB1 on DU2. There are six processing blocks (PB1 - PB6) and accordingly six data units that are the result or output of these processing blocks (DU7, DU9, DU11, DU12, DU13, DU14). There is one storage block (SB1). In Figure 2, the storage block is used both as a pass through block (DU6 enters SB1 and is passed on to PB3) and as the source for data base processing (DU1 and DU8 are jointly processed by PB4). Note that the autonomy of the data units need not be preserved. A new data unit DU11 that involves DU1 and DU8 is formed. The system has three customers (CB1 - CB3), each of whom receives some subset of the information products. Note also that, as discussed in Section 1.5, multiple copies of data can be produced. For example, two copies of DU6 are produced and used subsequently by PB1 and PB3. Note that the placement of a quality block following a vendor block (similar to acceptance sampling) indicates that the data supplied by vendors in general is deficient with regards to data quality.

Figure 2: An Illustrative Information Manufacturing System

It should be noted that this modeling is similar to the use of data flow diagrams (DFD). "Vendor" and "customer" blocks are analogous to "external entities", "Process" block to "function", and "Data Storage" block to "data store". We have deliberately chosen not to use this terminology and notation primarily to emphasize in our exposition the analogy with product manufacturing, the theme of this paper. Also, the concept of quality block does not have a direct analogue in the DFD technique. However, those wishing to use DFD techniques to model an information manufacturing process certainly could do so. This would take advantage of knowledge of CASE tools held by many information systems professionals.

As will be explained in greater depth in Section 3, the data units have associated with them vectors of characteristics or parameters whose components change as a result of passing through the various stages of the information manufacturing process. What constitutes a data unit is context dependent. For example, if all fields for all records of a certain file possess the same timeliness and data quality characteristics, and if the entire contents of the file are processed in the same manner, then that file could be treated as a single data unit. In contrast, if the fields within a record differ markedly in terms of their timeliness and data quality attributes, then it would be necessary to model them individually. By this we mean that each field of each record would be treated as a different data unit. Clearly in practice compromises would have to be made to avoid an inordinate quantity of data units, but in theory there is no limit regarding their number.

The Optiserve case described in Section 5 will illustrate how to convert a given environment into an information manufacturing system model of the type displayed in Figure 2. That case will examine two alternative configurations, one consisting of rather minor changes, the other a reengineering of the system. The resulting information manufacturing systems are shown for both alternatives.

2.2. Measurement of Key System Parameters

We present below procedures for quantifying the four key attributes of the information manufacturing model: timeliness, data quality, cost, and value.

2.2.1. Timeliness

The timeliness of a raw or primitive data unit is governed by two factors. The first, currency, refers to the age of the primitive data units used to produce the information products. The second, volatility, refers to how long the item remains valid. The age of some data, that is, its currency, does not matter. The fact that George Washington was the first president of the United States remains true no matter when that fact entered the system. In contrast, currency matters in the case of a market free fall, when yesterday's stock quotes may be woefully out of date.

The currency dimension is solely a characteristic of the capture of the data; in no sense is it an intrinsic property. The volatility of the data is, however, an intrinsic property unrelated to the data management process. (We may choose to manage volatile data in such a way that it is reasonably current, but such activities do not affect in any way the underlying volatility.)

2.2.1.1. Timeliness Measure for Primitive Data Units

The first step in developing a measure for timeliness of a primitive data unit is to quantify the currency and volatility aspects of timeliness. Both currency and volatility need to be measured in the same time units.

It is natural to use time tags to indicate when the data item was obtained; see, for example, [Wang, Reddy, & Kon, 1992; Tansel et al., 1993; Wang, Kon, & Madnick, 1993] This information is used to determine an appropriate currency measure. The currency measure is a function of several factors: when the information product is delivered to the customer (Delivery Time); when the data unit is obtained (Input Time); and how old the data unit is when received (Age).

Currency = Delivery Time - Input Time + Age (1)

As will be illustrated in Section 4, volatility is captured in a way analogous to the shelf life of a product. Perishable commodities such as food products are sold at the regular, full price only during specified periods of time. Degradation of the product during that time is not deemed to be serious. Similarly, suppliers of primitive or raw data units and/or data managers would determine the length of time during which the data in question remain valid. This number, which we refer to as shelf life, is our measure of volatility. The shelf life of highly volatile data such as stock quotes or currency conversion tables would be very short. On the other hand the shelf life of data such as the name of the first president of the United States would be infinite. The shelf life would be determined by the data quality manager in consultation with the information product consumers and of necessity is product dependent. If the information product is designed for customers who are long-term investors in the stock market, then quotes in today's paper regarding yesterday's close are more than adequate. If the product is for customers who are "in and out" traders, then the most recent trading price is appropriate. In the former case shelf life is in terms of one or more days. In the second case it is minutes or even seconds.

Our approach postulates that the timeliness of an information product is dependent upon when the information product is delivered to the customer. Thus timeliness cannot be known until delivery. The purpose of producing a timeliness measure is to have a metric that can be used to gauge the effectiveness of improving the information manufacturing system. For comparison purposes, it is important to have an absolute rather than a relative scale for timeliness. With this in mind we measure timeliness on a continuous scale from 0 to 1. Value 1 is appropriate for data that meet the most strict timeliness standard; value 0 for data that are unacceptable from the timeliness viewpoint. The currency or overall age of a primitive data unit is good or bad depending on the data unit's volatility (shelf life). A large value for currency is unimportant if the shelf life is infinite. On the other hand, a small value for currency can be deleterious to quality if the shelf life is very short. This suggests that timeliness is a function of the ratio of currency and volatility. This consideration in turn motivates the following timeliness measure for primitive data units.

Timeliness = {max [(1-currency/volatility), 0]}s (2)

In Sections 4-5, volatility is measured in terms of shelf-life, thus

Timeliness = {max [(1-currency/shelf-life), 0]}s (2a)

The exponent s is a parameter that allows us to control the sensitivity of timeliness to the currency-volatility ratio. Note that for high volatility (i.e., short shelf life) the ratio is large, whereas for low volatility (i.e., long shelf life) the ratio is small. Clearly having that ratio equal to or close to zero is desirable. As that ratio increases, is the timeliness affected relatively little (s = .5, say), a lot (s = 2, say) or neither (s = 1)? The appropriate value for s is context dependent and of necessity involves judgement. The relevance and applicability of this formula as well as the ones that follow will be discussed at the end of this section.

Note that in product manufacturing, one would not ascribe an intrinsic property of timeliness to the raw material, whereas in information manufacturing timeliness is a key parameter that needs to be tracked. In Equation (2), volatility and the exponent s are given as inputs to the model, while currency is computed as will be presented in Section 3 and illustrated in Section 4.

2.2.1.2. Timeliness Measure for Output of Processing Blocks

Our goal is to attach a timeliness measure to each information output. Each such output is the result of certain processing and various inputs. Each input in turn can be the result of other processing and inputs. Potentially each information output is dependent upon several stages of processing and a host of primitive data units. This convolution is addressed by considering one block at a time. First, we focus on those blocks that involve processing activities, both arithmetic and non-arithmetic. Quality and storage blocks are treated next. It is important to keep in mind that a timeliness value is computed and attached to each process output. Timeliness is actually measured only for primitive data units.

Arithmetic Operations

Even simple cases present problems. Suppose, for example, that output value y is the difference of input values x1 and x2, i.e., y =x1 - x2. Assume further that x1 has a very good measure for timeliness whereas x2 has a poor measure for timeliness. If x1 = 1000 and x2 = 10, then the timeliness value for y is very good. Conversely, should x1 = 10 and x2 = 1000, the timeliness value for y is poor. Clearly any composite timeliness value must take magnitudes into account. How the variables interact must also be accounted for. If, for example, x1 and x2 have the timeliness measure described above, and are of roughly equal magnitudes, then outputs y1 = x1 + x2 and y2 = x1 * x2 clearly differ in how the poor level of timeliness of x2 impacts the timeliness of the outputs. From the calculus we know that given a function y = f (x1,x2,...,xn), xi = xi(t), then

This expression captures how the dependent variable is impacted by changes in time t. More importantly from our perspective, it accounts for the interaction among the independent variables. We, of course, are not concerned with rates of change of the variables with respect to time. Still the above can provide guidance regarding a timeliness measure for a computed output. Ordinarily one would expect that if the timeliness value for each of the inputs were 1, then the timeliness value for the output would be excellent, undoubtedly equal to 1 also. Conversely, if all primitive data items possess a timeliness value of 0, one would expect the timeliness value for any resulting output of the processing blocks to be 0 as well. Considerations such as these motivate our definition for timeliness of the output of a processing block that involves arithmetic computations. Let T(xi) denote the timeliness measure for xi and let y = f(x1,x2,...,xn) be an information output. Then we propose the following to represent or measure the timeliness of y.

(3)

where

Equation (3) is a weighted average of the T(xi). It is assumed that each of the terms above is evaluated using those values that determine the output value for y. [If y = x1 - x2 and x1 = 1000, x2 = 10, then these values will be used as appropriate in Equation (3).] Note that if T(xi) = 0 for all i, then T(y) = 0 and if T(xi) = 1 for all i, then T(y) = 1. The dependence of the timeliness of y on the interactions of the xi is captured in a manner analogous to the chain rule of the calculus. Finally, the need to involve the magnitudes of the values is explicitly modeled. The absolute values ensure that the range 0 to 1 is preserved and that positive and negative values do not cancel each other. (As mentioned, the validity of the formula is further discussed at the end of this section.)

Because of the currency of the timeliness dimension of data, timeliness measures cannot be stored. Rather they must be determined at the time the information product is delivered to the customer. Delivering the same information product to different customers at different times would result in different timeliness values for these customers.

Non-arithmetic Operations

Data units can undergo processing that does not involve any arithmetic operations. For some types of data the processing does not change the timeliness value. For example, if the data unit is a file and the process is to sort the file, then the output data unit would have the same timeliness measure as the input data unit. Recall that the timeliness measure ultimately depends upon the volatility of the raw data and the time the customer receives the information product. Built into the latter value is the time for, say, sorting the file. Also, if the activity should be to extract a subset from a data unit, the resulting (subset) data unit would inherit the timeliness measure from the (superset) data unit. Another situation would be combining all or a portion of two or more data units. For example, suppose two data units (files) are merged. Then a natural timeliness measure for the resulting data unit would be some weighted average of the timeliness values for the original data units. This is consistent with the timeliness value for computed outputs. [Recall that Equation (3) is essentially a weighted average of the timeliness measures of the input data units.] The weights could reflect the size of the data units that are merged, their importance or some combination of attributes. Section 4 illustrates these concepts and methodology using equal weights for the inputs to processing blocks.

2.2.1.3. Quality and Storage Blocks

The timeliness measure for the output of a quality block is the same as that for the input data unit. Again this is so even though quality control consumes time, the justification being that all timeliness measures ultimately depend upon that fixed point in the future when all processing has been completed. Thus time for one specific activity is already incorporated. Analogously for storage activity, the timeliness of a retrieved data unit is that of the stored data unit, and weighting is appropriate for combinations of data units.

2.2.2. Data Quality

It is also important to be able to assess, in an overall sense, the quality of the information products. These products, as discussed before, are manufactured through multiple stages of processing and are based on data that have various levels of quality. For our model we need to determine how each type of block affects the quality of the input stream. Some cases are straight-forward. For example, the storage of data does not affect its quality. (This assumes there is no external event such as accidental erasure of data.) If the incoming data to a storage block has a certain level of incompleteness, then the outgoing data has the same level. For the vendor block it is necessary to know the quality of the primitive data units. Determining this precisely can be difficult and may require, for example, a statistical analysis similar to that used by [Morey, 1982] . Alternatively, these values can be estimated by using some combination of judgment based on past experience and quality control procedures such as information audits. In any case, the quality estimations for the primitive data units are exogenous to the system being modeled.

For the quality block typically the output data quality is better than the input data quality. The magnitude of the improvement must be determined by the analyst, or furnished to the individual. As with timeliness, weighting or inheritance is appropriate for certain types of processing. The least straight-forward case is a processing block that involves arithmetic operations, which we now discuss.

Let DQ(xi) denote a measure of the data quality of data unit xi. As stated above, estimating the values for the DQ(xi)'s is an issue of concern only for the primitive data units. If all the inputs to some stage are themselves outputs of other stages, then the appropriate data quality measures have already been determined by applying at those previous stages the expression given below.

As before, we use a scale from 0 to 1 as the domain for DQ(xi) with 1 representing data with no quality problems and 0 those with intolerable quality. If all data items should have a data quality measure equal to 1 and if all processing is correct, then the quality measure for the output should be 1 as well. Conversely, if the quality of all the inputs is 0, then the quality of the output should be 0 as well.

Given this reasoning, we form a weighted average of the DQ(xi) values for the data quality of the output. Let y be determined by data items x1, x2, ....xn, i.e., let y = f(x1, ....,xn). Then an estimate for the data quality of output y resulting solely from deficiencies in the input units, referred to as Data Component (DC), can be obtained from

(4)

where

Note that DC satisfies 0 < DC < 1; DC = 0 if, and only if, DQ(xi) = 0 for all i; DC = 1 if, and only if, DQ(xi) = 1 for all i. Once again, DC involves the magnitude of the input values and the interactions among the data. Formulas analogous to (4) were used by Ballou and Pazer (1985). Data quality problems related to processing errors are discussed next.

Although it has been implicitly assumed that the processing activities are computerized, this need not be the case. In most systems some of the processing activities, such as data entry, have manual components. Especially in this situation, and to a lesser degree, with fully computerized systems, the processing itself can introduce errors. Let PE be a measure of processing effectiveness. If PE = 1, then the processing never introduces errors. If PE = 0, then the processing corrupts the output to such a degree that the data quality measure for that output should be 0. Thus, the output quality of y, DQ(y), is determined by both input data quality and processing effectiveness, i.e.,

DQ(y) = f(DC, PE) (5)

There are various possibilities for this relationship. For example, one such functional relationship is

(6)

Note that DQ(y) = 1 if, and only if, both DC and PE equal 1. Also DQ(y) = 0 if either DC or PE is 0. Also, if DC = PE should hold, then DQ(y) has the same value as DC and PE.

The data quality of the data items changes, of course, as these values undergo a series of processing and quality control activities. The inputs for a given process may well be outputs from other processes. Thus whenever a data value undergoes processing or quality control, the resulting quality measure of the output needs to be recorded so that the information is available for determining quality control of any subsequent outputs.

If the processing is complex, it may be necessary to substitute subjectively derived quality response functions for the calculus-based analysis. For example, this would be necessary if the processing block involves a linear program. In such cases, one could specify that the output quality is some function of the average input quality. This function could be determined by simulation.

2.2.3. Cost of Information Product

To evaluate the effectiveness of improving the system, it is necessary to compare changes in value to the customer with changes in cost. However, costing in a multi-input, multi-output environment such as the one we model is difficult to do and the approaches available are often controversial. Implications of costing for multiple use systems have been discussed by [Kleijnen, 1980] . Kramer et al also considered difficulties encountered in trying to predict and track costs [Kraimer, Dutton, & Northrup, 1981] . Nevertheless, because of its importance, there is a substantial body of literature dealing with the pricing of computer services; see, for example, [Bernard, Emery, Nolan, & Scott, 1977; Kriebel & Mikhail, 1980].

In our methodology, we adopt a cost accumulation and prorata allocation approach which, although ad hoc, facilitate the estimation of the information product's cost in a straightforward manner. As long as this costing approach is used consistently in evaluating all the possible options, it would not lead to erroneous decision.

2.2.4. Value to the Customer

Ultimately, of course, the measure that counts is the value of the product to the consumer. This has been emphasized in both manufacturing and information systems environments [Deming, 1986; Garvin, 1988; Wang & Kon, 1993] . Our approach is to hypothesize an ideal product, one with 100% customer satisfaction. Any actual product would deviate from the ideal on several dimensions. Since our concern is with evaluating alternative system designs so as to improve either timeliness or data quality or both, it is natural in this context to limit consideration of the determinants of value to these dimensions. Thus for each customer C, the actual value VA is a function of the intrinsic value VI, the timeliness T and data quality DQ, i.e.,

VA = fc (VI, T, DQ) (7)

Given the above mechanism for measuring an information product's timeliness and data quality, a functional form for VA could be

VA = VI (w * (DQ)a + (1-w) * Tb) (8)

Here VI, w, a, and b are customer dependent. Also, the weight w is a number between 0 and 1 (inclusive) and captures the relative importance to the customer of product quality and product timeliness. For example, w = 1 implies timeliness is of no concern to the customer whereas w = .5 implies that quality and timeliness are equally important. The exponents a and b reflect the customer's sensitivity to changes in DQ and T. Variants of Equation (8) have been utilized in several previous research efforts; see, for example, Ahituv [1980] and Hilton [1979] .

2.2.5. Discussion of Formulas

In this section various formulas have been presented to measure timeliness, data quality, and values. To use the information manufacturing model, these factors must be quantified. A discussion preceding each formula identifies properties that any measure of the quality in question must possess. As indicated, some of these measures can be justified on the basis of previous work. Thus these formulas build upon the accumulated knowledge and can be applied in a wide range of situations. That being said, the precise expression for these measures is not critical for the information manufacturing model. If in a certain case or situation those responsible for implementation of the information manufacturing system feel that a different set of formulas would be more appropriate, then the analysis would proceed using their formulas exactly as with our formulas. For example, if it is possible that order-of-magnitude errors will not be detected (e.g., $12.53 is entered as $1253.) then equation (4) should be replaced by a formula involving the root-mean.

3. A Methodology for Determining Information Product Attributes

The concepts and methods described in the previous section form the basis for the methodology introduced in this section. The purpose of this methodology is to determine information product attribute values which can be used to suggest ways to improve the system.

Timeliness, quality, and cost attribute values provide the producer with a means to assess the potential value of the information product for customers. As with product manufacturing, however, the producer should analyze the information manufacturing process with the goal of improving one or more of these values. Doing this may well result in degradation of the other attribute values but should in an overall sense enhance the value of the information product for the customer. The producer would have to determine whether the tradeoff is beneficial. To improve the timeliness value, for example, one needs to change the system. There are two ways of accomplishing this. Modifying data gathering procedures so as to obtain data which are more current is one approach. The other is to modify the system so as to process the data more rapidly. In both cases a mechanism is needed that can be used to determine what approach would produce the largest improvement in timeliness, in a cost effective manner. For this we present a tabular structure, which we call the information manufacturing analysis matrix.

The Information Manufacturing Analysis Matrix has one row for each of the data units, primitive and computed. With the exception of those blocks representing the data vendors, the matrix has one column for every block. Should a particular data unit pass through a certain activity block, then associated with the cell determined by the appropriate row and column is a five-component vector, the components of which are described below. Figure 3 presents the Information Manufacturing Analysis Matrix for the system displayed in Figure 2.

PB1
PB2
PB3
PB4
PB5
PB6
SB1
QB1
QB2
QB3
CB1
CB2
CB3
DU1
X
X
DU2
X
DU3
X
DU4
X
DU5
X
DU6
X
X
X
DU7
X
DU8
X
X
X
DU9
X
DU10
X
X
DU11
X
DU12
X
DU13
X
X
X
DU14
X
X

Figure 3: The Information Manufacturing Analysis Matrix for Figure 2

The "X" found in certain cells of the matrix indicates that the data unit passes through that activity block and there is a five-component vector of parameters associated with that cell.

In order to determine appropriate modifications to the system, it is necessary to track time, data quality and cost. The information needed for this is first described in general terms, and then followed by a discussion of these parameters, for each of the different types of activity blocks. Listed below are the five components of the vector of parameters. (This is shown in detail in Figure 5.)

p: This specifies the predecessor or originating block. For example, the predecessors of PB2 are QB2, the origin of DU8, and VB3, the origin of DU5.

t1: This represents the time when the data unit is available for the activity. For example, the value for t1 for the vector associated with (DU10, PB6) is that time when DU10 is ready for the processing block PB6. In the special case when p is a vendor block (VBI) then t1 = Input Time as given in Equation (1), the expression for currency.

t2: This is the time when the processing begins. Processing cannot start until all data units needed for that block are available. Also, processing may begin at a scheduled time tsch. Thus t2 is the larger of max{t1's} and tsch.

DQI: This is the quality of the incoming data unit for a particular activity. It is, of course, the same as the data quality of the output of the predecessor block. As mentioned above, we use the term "data quality" as a place holder for whatever dimension or dimensions of data quality are of concern to management (with the exception of timeliness).

CostI: This represents the cost of the incoming data unit. In essence, CostI is the pro rated accumulation cost of all the previous activities that this data unit has undergone. (We assume that if a data unit is passed onto more than one activity, then CostI for those is determined in some pro rated fashion. This implies that total cost is preserved.)

We now examine in some detail the implications of the parameters for each of the activities. For this it is useful to use the notation DQo to refer to the output quality resulting from some process and Costo as the cost of the output. DQo is computed using the concepts and expressions given in Section 2, and it is used to determine DQI of successor blocks. Also Costo is the sum of all input CostI plus cost of the block.

Processing Block. For a processing block the interpretation for each of the parameters is straight-forward. Information regarding the cost, processing time, and impact on quality for each processing block needs to be furnished to the analyst. Assuming arithmetic operations are involved, the quality of the output is computed using Equation (5). This is then the value of DQI for all blocks that receive that output. The time when the processing is complete is used as the t1 value for all blocks that receive that output. If there is a delay in getting the data unit to the next activities, it may be necessary to include a delay parameter as a descriptor of a processing block. This concept also applies to quality and storage blocks.

Quality Block. A quality block can operate on only one data unit at a time. It may, however, process different data units at different times. It is necessary that the t's reflect this. DQI is simply, of course, the DQo of the previous process. Determination of the DQo of the quality block would require something akin to the statistical analysis described by [Morey, 1982] . If t2 > t1 should hold, then time may be needlessly wasted at this step. A positive value for t2-t1 would reflect a scheduling delay or a queuing delay. If an entire file is being checked, we assume that the file is not ready to be passed on until all corrections and changes have been made.

Storage Block. The value for t2 is that time at which storing of the data unit commences. Assuming storage of data cannot affect quality, DQI = DQo. If a certain subsequent process should require a subset of some data unit (part of a relational table, for example), then that subset inherits the data quality properties of the data unit. A data unit is modeled for each subset, even if they should come from the same original data unit. Information regarding the cost and storage time also needs to be furnished to the analyst. Note that storage time is how long it takes to store the data unit. The data unit is available for subsequent processing any time after it is stored. The amount of time between when the data unit is stored and when it is used does not affect the overall timeliness value of information products delivered to customers unless the storage block should lie on the critical path. In this case, timeliness is impacted by the Delivery Time component of Equation (1).

Customer Block. For the customer block, t1 represents the time the product is finished, t2 the time it is received. For on-line delivery systems t1 = t2 could hold. For this activity DQI has the value of DQo of the final process that generated the product. Costo = CostI assuming the cost of delivery is negligible. If delivery affects cost or quality, then the impact can be modeled as an additional processing block.

3.1. Timeliness, Quality and Cost of Information Products: Customer Perspective

The above structure provides the basis for making changes to the information manufacturing system by allowing one to quantify the parameters of timeliness, quality and cost. We now discuss issues related to this in the context of the customer's needs. Determination of customer needs, especially for external customers, regarding these parameters can be made using market research techniques such as focus groups.

Timeliness. A value for the timeliness of a information product for a particular customer cannot be determined until the customer has received the information product. This value can be determined by first computing the timeliness values T(xi) for each of the primitive data units provided by the vendors using Equation (2). These values are then available as input to subsequent activities. As previously discussed, sometimes the timeliness value for an output from an activity block differs from the input values, sometimes they are the same. Whenever arithmetic processing is involved, then Equation (3) would be invoked. Activities such as quality control affect timeliness via the Delivery Time component in Equation (1). Note that the need to wait until the time when the information product is delivered to the customer necessitates a "second pass" through the system in order to compute the timeliness values (to be illustrated in Section 4). If timeliness is specified and measured in terms of a contracted delivery date, then the second pass through the system is not necessary, as the "delivery time" is prespecified. However, the timeliness analysis is still important as it may be possible to delivery the product sooner and hence gain a competitive advantage.

The information product's timeliness measure is, of course, a number between 0 and 1. Whatever the number happens to be, its significance or interpretation is customer dependent. Some customers might find the information product's timeliness to be satisfactory, others not. If it is determined that the product is not sufficiently timely, then this timeliness value serves as a benchmark to determine the impact on timeliness of various changes that could be made to the information manufacturing system. Using the framework described in this section, one can reengineer the system and re-compute the timeliness, quality and cost values. For example, one possible way to improve timeliness might be to eliminate a specific quality block, which could enhance the ultimate timeliness at the expense of quality.

Quality. The quality parameter is simpler in that the quality measure of the product delivered to the customer is simply that associated with the output of the final activity block. Again though, its meaning or significance is customer dependent. Whatever the number happens to be, the customer might feel that the quality is totally satisfactory, completely unsatisfactory, or anything in between. Again, the importance of the number is that it serves as a benchmark to gauge the magnitude of quality improvement resulting from making changes to the system. Note that different customers will perceive the quality differently. Some may feel the quality is fine whereas others may demand enhanced quality. In some sense the information producer needs to optimize total value across all products and all customers.

3.2. Cost and Value

The cost of the information product is of interest to the producer. The customer is concerned with value received and price paid, the latter being an issue beyond the scope of this paper. As discussed, both quality and timeliness influence value, as do many other factors which we deliberately have not modeled. To perform the analysis, some functional expression relating these quantities is required. Solely for the purposes of discussion we use Equation (8).

To maximize the net value (total value minus total cost) received by all customers for all information products, one must obtain information from the customers regarding each product's intrinsic and actual value together with the customer's perception regarding timeliness and quality. Suppose there are M customers and N information products. Then for each customer i and product j, an expression of Equation (8) applies, namely

VA(i,j) = VI(i,j) * [w(i,j) * (DQ(j))a(i,j) + (1 - w(i,j)) * T(i,j)b(i,j)] (9)

Many of the VI(i,j) values would be zero. The double subscript on T is necessary since the same product could be delivered to different customers at different times. Only a single subscript for product quality is required, as the computed quality measure of a particular product is the same for all customers. The customer's sensitivity to improvements in data quality and timeliness can be handled via the exponents a(i,j) and b(i,j) respectively. The producer wishes to optimize net value. Assuming appropriate and consistent units, the problem is given by

Maximize (10)

subject to 0 < T(i,j) < 1 1< i < M

0 < DQ(j) < 1 1< j < N

Here C(i,j) represents the portion of the cost of product j assigned to customer i. It should be noted that this formula is used in the context of evaluating a small set of possible alternative information manufacturing system configurations instead of the traditional optimization.

3.3. Software Tools

Two software tools have been developed, based on the methodology, to assist the analysis of various options for improving information manufacturing systems [Pak & Pando, 1993] . The first, and simpler, tool is implemented in the standard spreadsheet. It allows the determination of the values in the information manufacturing analysis matrix and the measures including timeliness, data quality, and cost. For instance, the values in the illustrative example to be presented in the next section were derived using this tool. The second, and more sophisticated, tool is implemented through Borland's Halo GUI package and Microsoft's C. The GUI interface allows the system designer to define a configurations through icon clicking and dragging. The program computes the values in the matrix and the measures once the configuration is defined. It also allows the analyst to ask various "what if" questions with or without a configuration change.

We have presented a methodology for determining the timeliness, quality, and cost of information products. In the next section, an example is presented to illustrate some of the conceptual and computational issues that will be encountered when the information manufacturing model is applied to real world scenarios. It also illustrates the business impact of possible changes to the system.

4. Illustrative Example

For continuity, the system depicted in Figure 2 and described in the previous section will be used. Figure 4 presents the descriptive inputs required to compute the timeliness, quality, cost, and value characteristics of information products to be delivered to an array of customers.

Input
Vendor
Cost
Quality
Input Time
Age
Volatility
Shelf Life
Timeliness Function
DU1
VB1
10
.9
0
2
Medium
60
Linear
DU2
VB1
10
.5
0
2
High
30
2nd Degree
DU3
VB2
20
.7
5
15
Low
90
Linear
DU4
VB3
30
.9
10
5
Medium
60
Linear
DU5
VB3
20
.8
10
5
Medium
60
Square Root

Figure 4(a): Descriptive inputs required for the primitive data units
Process
Cost
Time
Quality Function
Delay
PB1
20
3
0
PB2
30
4
0
PB3
30
6
0
PB4
10
2
0
PB5
70
4
2
PB6
100
10
1

Figure 4(b): Descriptive inputs required for the processing blocks
Quality Block
Cost
Time
Quality Function
Delay
QB1
20
6
0
QB2
40
8
0
QB3
50
10
0

Figure 4(c): Descriptive inputs required for the quality blocks
Storage Block
Cost
Time
Delay
SB1
5/Input
1
0

Figure 4(d): Descriptive inputs required for the storage block

Figure 4 Data for illustrative example

Figure 4(a) identifies the seven descriptive inputs required for each of the five primary data units. For example, DU2 is obtained from the first vendor at a cost of 10, is of intermediate quality, and is already 2 time units old when it enters the system at the beginning of the information manufacturing process. It is highly volatile with a shelf file of only 30 time units and a second degree timeliness function (i.e., s = 2 in Equation (2a)).

By contrast, only four descriptive inputs are required for each of the six processing blocks as shown in Figure 4(b). For example, PB2 has a cost of 30 and requires 4 time units to complete. As noted in the previous section, when processing is complex, it may be necessary to substitute subjectively derived quality response functions for the calculus based analysis. This is the process followed in this example. For PB2, output quality is equal to the square of the average (unweighed) quality of the two input data units (DU5 and DU8). The output of this block is available to the next block without delay.

Each quality block also requires only four descriptive inputs shown in Figure 4(c), QB2 has a cost of 40 and requires 8 time units to complete. Once again a subjectively derived quality output function is employed. The effect of this quality block is to eliminate 75% of the difference between the quality of the input flow and the best achievable quality (i.e., Qout = 1.0). This output is also available without delay.

Figure 4(d) presents the three descriptive inputs required for the storage block. For simplicity only a fixed cost of 5 units per input is assigned. The storage time for SB1, 1 time unit, is the time spent to store a data unit and, as mentioned earlier, is unrelated to the time the data unit actually spends in storage. No additional delay is encountered.

The only additional requirement for analyzing the system is the specification of a cost accounting assumption relating to the allocation of input and processing costs across multiple outputs. The simplifying assumption of equal allocation is made for this example.

Based upon the descriptive inputs from Figure 4, a Information Manufacturing Analysis Matrix for this system is developed and shown in Figure 5.

Now we proceed with explaining the mechanics of applying the methodology using the data given in Figure 4. It may be instructive to view the column corresponding to PB6. We can observe that this block requires 10 time units and incurs a cost of 100. Its quality output function is of the second degree and there is a delay of 1 time unit to deliver its information product to the final customers. This processing block requires three inputs DU4, DU10, and DU11 which arrive from VB3, QB3, and PB4 respectively at times 10, 31, and 20. Since processing will not begin until all inputs are available, processing starts at time = 31. The quality and costs of the three inputs are (0.9, 30), (0.9373, 73.75), and (0.9153, 77.6).

PB1 PB2PB3 PB4PB5 PB6SB1 QB1QB2 QB3CB1 CB2CB3
parameters
time
3
4
6
2
4
10
1
6
8
10
cost
20
30
30
10
70
100
5
20
40
50
quality
1
2
0.5
1
0.5
2
1
0.5
0.25
0.25
delay
0
2
1
DU1p SB1 VB1
t1
1
0
t2
18
0
DQI
0.9
0.9
CostI
15
10
DU2p VB1
t1
0
t2
0
DQI
0.5
CostI
10
DU3p VB2
t1
5
t2
6
DQI
0.7
CostI
20
DU4p VB3
t1
10
t2
31
DQI
0.9
CostI
30
DU5p VB3
t1
10
t2
17
DQI
0.8
CostI
20
DU6p QB1 SB1 QB1
t1
6
7
5
t2
6
7
6
DQI
0.75
0.75
0.75
CostI
15
20
15
DU7p PB1
t1
9
t2
9
DQI
0.725
CostI
55
DU8
p
QB2
SB1
QB2
t1
17
18
17
t2
17
18
17
DQI
0.9312
0.9312
0.9312
CostI
47.5
52.5
47.5
DU9
p
PB2
t1
21
t2
21
DQI
.7493
CostI
97.5
DU10
p
QB3
QB3
t1
31
31
t2
31
31
DQI
0.9373
0.9373
CostI
73.75
73.75

Figure 5: The Information Manufacturing Analysis Matrix for the illustrative example

PB1
PB2
PB3
PB4
PB5
PB6
SB1
QB1
QB2
QB3
CB1
CB2
CB3
DU11
p
PB4
t1
20
t2
31
DQI
0.9153
CostI
77.6
DU12
p
PB3
t1
13
t2
13
DQI
0.866
CostI
50
DU13
p
PB6
PB6
PB6
t1
41
41
41
t2
42
42
42
DQI
0.842
0.842
0.842
CostI
93.75
93.75
93.75
DU14
p
PB5
PB5
t1
35
35
t2
37
37
DQI
0.9681
0.9681
CostI
71.876
71.876

Figure 5 (continued): The Information Manufacturing Analysis Matrix for the illustrative example

The output of this process block is DU13 and is represented by a row in the Information Manufacturing Analysis Matrix. It can be seen from this matrix as well as from Figure 2, that DU13 is a final information product which is provided to three customers. It is created at time = 41 by PB6 (31+10), but since there is a one unit delay it is not received by the customers until time = 42. Since , the average quality for the three inputs described above is .9176 and since the quality output function is of the second degree, data quality of DU13 = .842. The cost of the three inputs when added to the processing cost yields a sum of 281.35. Since this is equally distributed over the three outputs, Ci = 93.75 for DU13.

Figure 6 provides the information required to evaluate the information products to the customers who receive them. The relevant cost and quality are obtained from the Information Manufacturing Analysis Matrix. As discussed in the previous section, determining the timeliness value requires a "second pass" through the system. For example, once it is determined that DU12 is delivered to the customer at time = 13, the currency value of its primary data input (DU2) can be determined by Equation (1) as 13 - 0 + 2 = 15. Therefore, the timeliness value of DU2 can be determined by Equation (2a) as (15/30)** 2 = 0.25. Since there is only one input, this is also the timeliness value of the information product, DU12. In a similar manner, once delivery times for DU13 and DU14 are determined to be 42 and 37, the timeliness values of various primary inputs can be determined by Equation (2a). Starting with these and employing equal weights (for simplicity) at each point of convergence, the timeliness values for DU13 and DU14 are determined to be 0.35 and 0.46.

The right hand portion of Figure 6 presents customer specific descriptive inputs required to determine the value of information products to the three customers. Also for simplicity the linear version of Equation (8) was utilized for all customers. In this example, marketing research has determined that Customer 1 finds data quality and timeliness to be equally important. Timeliness is twice as important as data quality for Customer 2 while the reverse is true for Customer 3. This example also shows that the intrinsic value for the same information product can vary among customers.

Output
Cost
Quality
Timeliness
Customer
VI
w
a
b
DU12
50
.866
.25
CB1
100
.5
1
1
DU13
93.8
.842
.35
CB1
200
.33
1
1
CB2
180
.33
1
1
CB3
160
.33
1
1
DU14
71.9
.968
.46
CB2
120
.67
1
1
CB3
15
.67
1
1

Figure 6: Information required to evaluate the information products

Figure 7 presents the results of the illustrative example and highlights the "bottom line" of this entire analysis. It shows that, on the aggregate, this information manufacturing process generates 64.69 in net value (the difference between total value to the customers and total cost to the firm). It should be noted that this is not the same as net profit but it appears that, in aggregate, it should be possible to negotiate prices which will provide both a profit to the manufacturer and a net value to the customers.

Customer 1
DU13
DU12
Total Cost
Q0.842 0.866
C93.75 50 143.75
T0.35 0.25
VI200 100
W0.33 0.5
Va102.25 55.8 158.05
Net Value 8.5 5.8 14.3014.30
Customer 2
DU13
DU14
Total Cost
Q0.842 0.968
C93.8 71.88 165.63
T0.35 0.46
VI180 120
W0.33 0.67
Va92.03 95.91 187.94
Net Value -1.72 24.04 22.3122.31
Customer 3
DU13
DU14
Total Cost
Q0.842 0.968
C93.75 71.88 165.63
T0.35 0.46
VI160 140
W0.33 0.67
Va81.8 111.9 193.70
Net Value -11.95 40.02 28.0728.07
Total Value 64.69

Figure 7 Evaluation of information products (before improvement)

Another picture emerges, however, when net value is viewed, in the disaggregate, by looking at individual customers. The value of DU13 to Customers 2 and 3 is less than the cost of production. If either of these customers should discontinue buying DU13, the consequences would be substantial since revenues would decline but costs would remain unchanged (unless production of DU13 were terminated). Since the purpose of this framework is to provide a vehicle for improving information manufacturing systems, the example will be extended in that direction. By inspecting Figure 7, it can be determined that quality is near the top of the scale (recall that .842 is a point on a zero-one quality scale and does not imply that only 84.2% of the output is correct). On the other hand, timeliness is rather poor for all information products. This suggests a quality-timeliness tradeoff which could be achieved by the elimination of a time consuming quality block. Since DU13 is in a deficit position it should also be a quality block which impacts this output. QB3 is such a block. The justification for this is found in Figures 2, 4, 5, and 7. It is further observed from Figure 5 that QB3 is on the critical path for all information products except DU12. A side benefit of this quality-timeliness tradeoff will be the avoidance of the 50 unit cost for QB3 . The analysis is repeated and the new bottom line is shown in Figure 8. Not only has the aggregate net total value more than doubled but also each information product now has a positive net value for each customer. Given the parameters of this example, the avoidance of QB3 seems an excellent first step in improving this system [For a note of caution, see [Chengalur-Smith, Ballou, & Pazer, 1992] where it was shown that this may be detrimental if there is a considerable variability in the input material].

Customer 1
DU13
DU12
Total Cost
Q0.731 0.866
C85.4 50 135.41
T0.4925 0.25
VI200 100
W0.33 0.5
Va114.39 55.8 170.19
Net Value 28.99 5.8 34.7834.78
Customer 2
DU13
DU14
Total Cost
Q0.731 0.866
C85.4 59.4 144.79
T0.49 0.58
VI180 120
W0.33 0.67
Va102.95 92.52 195.47
Net Value 17.55 33.12 50.6850.68
Customer 3
DU13
DU14
Total Cost
Q0.731 0.866
C85.4 59.4 144.79
T0.49 0.58
VI160 140
W0.33 0.67
Va91.51 107.94 199.46
Net Value 6.11 48.54 54.6654.66
Total Value 140.12

Figure 8 Evaluation of information products (after improvement)

5. Application of the Information Manufacturing Model: The Case of Optiserve

In this section we apply the model described in this paper to a mission-critical information manufacturing system found in a major optical products company, which we will refer to as Optiserve [Lai, 1993; Wang & Firth, 1993] . For expository purposes, we present the relevant portion of the case with a simplified system topology. Three scenarios are discussed the existing system, the present system with one relatively straight-forward change (minor), and a reengineered system (major). Since the most difficult aspect of implementing the model is obtaining estimates for the various parameters needed by the model, we concentrate on this. The analysis, once the numbers are available, proceeds as was demonstrated in Section 4.

5.1 Current System

Optiserve is a large optical products chain with 750 stores located throughout the country. It provides one-stop eye care in that each store provides eye exams, optical products, and fitting of the products. However, grinding of the lens is carried out at four manufacturing laboratories. Our analysis focuses on the express orders which are handled by the main laboratory. Optiserve strives to differentiate itself in several ways, one of the most important being customer service. This is a key factor in Optiserve's mission, which is to "create patients for life." However, at the present time, problems with data quality not only are costing the company over two million dollars on an annual basis but also are resulting in the loss of customers [Wang & Firth, 1993] .

The current information manufacturing system is displayed in Figure 9. Several types of data, modeled as DU1, are obtained and entered onto the Spectacle Rx form, the output of interest in this case. At this stage, the Rx form has patient information (e.g., patient name and telephone number), the glasses prescription (provided by the optometrist), information on the glasses themselves (e.g., frame number and cost, provided by the optician), and additional characteristics such as distance between the patient's pupils provided by the optician. These forms are batched and entered by the optician (represented by PB1) into the store's PC whenever the optician has free time, often at the end of the day. Roughly 80% of the data quality problems arise as a consequence of the above process.

Figure 9 Current Information Manufacturing System

Normally twice a day the optician forwards the day's orders (DU2) to an IBM mainframe based at corporate headquarters in Cleveland. In a process represented by PB2, the IBM queries the item master file (SB1) to determine if the frame ordered is available (DU3). Updating of this file is represented by VB2. The Rx ticket (DU4) is then checked by the IBM for completeness and correctness (QB1). Assuming no problems, it then forwards the Rx ticket (DU5) to an HP computer also based in Cleveland. That computer accesses the frame characteristic file (SB2) to obtain information (DU6) regarding the physical characteristics of the frames ordered (size, shape, etc.) and uses that information to generate the specifications the laboratory will use to actually grind the lenses (PB3). In some cases this cannot be physically done. For example, if the lens is large, the blanks may not be thick enough to accommodate a high curvature. (The process of checking the output of PB3, namely DU7, is modeled by QB2). Assuming no problems, the spectacle Rx ticket (DU8), now complete with grinding instructions, is returned to the IBM (PB4) which routes the Rx ticket (DU9) to the main laboratory (CB1).

It is important to keep in mind that the above scenario captures an information system that supports a manufacturing system. The information product, which is manufactured on a regular, repetitive basis, is the completed Spectacle Rx ticket (DU9) which is delivered to one of the laboratories. The customer for the information product is the laboratory, an internal customer (CB1).

The fraction of Rx tickets that exhibit some data quality problem at some stage is not excessive, as 95% of the tickets are completely free of data quality problems. Of those with deficiencies, two-fifths relate to initial data gathering and entry, another two-fifths to frame-related errors, and one-fifth due to inability to match the frame chosen with the prescribed physical requirements for grinding. 2% of all Rx tickets which are in error are never detected until the patient receives the glasses. For the 3% of all the Rx tickets where a problem is detected, the optician has to contact the customer, who usually has to come back into the store. As mentioned, this results in a non-trivial financial loss to the company. More importantly, it violates Optiserve's desire to differentiate itself by service, and results in permanent loss of customers.

Two approaches are available to modeling the quality block. One is to assume that the quality block models both error identification and error correction. In this case, the time for the quality block is the weighted average of time for each of these streams. The other approach is to model separately the flows of good and defective units. In the Optiserve case, the inspection time at QB1 and QB2 is trivial and the rework time substantial (days or even weeks). Furthermore, the number of cases that require rework is small (.02 or .01), which means that the former alternative would lead to concealment of the problem (substantial delay for some customers). Thus we model the defective flows separately, as shown in Figure 10. Although the system in Figure 10 is substantially more complex than that in Figure 9, as will be shown in Subsection 5.3, the additional data requirements are quite modest.

Figure 10 Current System with Rework

5.2 Alternative Designs

As indicated, the purpose of our model is to provide information regarding the impact of proposed changes to the information manufacturing system. The Optiserve case illustrates substantial problems with all three dimensions tracked, i.e., quality, timeliness, and cost. Two design alternatives are proposed and discussed. The first involves a relatively straight-forward change to the current system which affects data quality directly, and hence indirectly improves timeliness, and lowers operational cost. The second involves a reengineering of the system, which substantially improves the timeliness and quality dimensions and hence the company's goal of superior customer service, but at an increased cost in terms of hardware and software.

The first alternative involves having the optician enter the patient-generated data directly into the PC at the time the patient is in the store. The optician and software in the PC perform the role of a quality block. The optician reads back to the customer key information that has been entered much as an airline reservation clerk repeats flight information to a passenger. The quality block facilitates a comparison by the optician of the new prescription with the previous prescription to detect any obvious problems. (Significant changes in the prescription should be verified with the customer or, if necessary, with the prescribing optometrist.) This could be done only for repeat customers. For new customers, a comparison can be made between the current eye glasses and the prescription. Finally the quality block inspects the Rx form to detect obvious errors such as an invalid code for eye glass frame, and ensures that all fields are filled. The rest of the system is unchanged.

The reengineering option is a decentralized version, which is shown in Figure 11. (Note that the use of the prime notation indicates that the flows and activities have been redefined as part of the reengineering.) The PC in the store has been upgraded to include features such as a math co-processor, and now has the responsibility for most of the computations and processing. The optician still enters the data directly into the PC, and the quality checks instituted in the first alternative are still in place. Most importantly, the computation of the physical characteristics of the lenses formerly performed by the HP computer (that resides in Cleveland) are now performed by the PC in the store while the customer waits. A copy of the item master file (SB1') and frame characteristic file (SB2') reside on the PC, and are updated by a server located at the headquarters periodically as appropriate. (These activities are modeled by VB2' and VB3'.) If any problems are identified, they can be resolved immediately, as the patient is still in the store. (QB1' models error detection and correction.) This results in a major improvement in patient service and would serve to further differentiate Optiserve on the service dimension.

Figure 11: Reengineered Optiserve Information Manufacturing System

If there is no problem, the PC forwards the Rx ticket to the server in the headquarters (which replaces the IBM mainframe), which in turn forwards the ticket to the appropriate laboratory.

In the reengineered system, the server performs the following functions: (1) It maintains the most current version of the item master file and the frame characteristic file. Periodically, it updates the 750 PCs at the store level with the most current version. (2) It keeps track of the inventory level of blank lens, frames, etc. in the laboratories. Each laboratory will periodically report its actual inventory level to the server for reconciliation purposes. (This would account for breakage and defective items.) (3) The server would route the spectacle ticket to the appropriate laboratories.

5.3 Data Requirements for the Optiserve Case: Current System

In this subsection we describe how to obtain the data required to implement the information manufacturing model in the case of Optiserve. The data requirements for the higher (macro) level model, shown in Figure 9, are discussed first followed by those for the more detailed (micro) model (Figure 10).

Figures 12(a) through 12(e) present the input data values needed to compute the timeliness, quality, cost, and value characteristics of the Rx ticket - the information product. We describe in detail how values for one row from each subfigure were obtained. Although the other rows are not necessarily handled in the same manner, similar procedures can be applied to obtain the corresponding input values.

5.3.1. Primitive Data Units

Eight descriptors are required for each primitive data unit. As an example, we will consider DUl.

1. Vendor - The vendor is the patient who is the source of the information collected by both the optician and the optometrist.

2. Cost - The cost of securing the patient information has two major components. The cost of the optician's time was approximately $10 while the cost of the optometrist's time was approximately $20 to yield the estimate of $30.

3. Quality - The estimate of the quality of the input is constructed from three sources: the proportion of remakes due to data quality (2%), the proportion of erroneous orders detected at QB2 (1%), and the proportion of erroneous orders detected at QBl which were attributable to vendor input (1%). Thus, the proportion of "defects" originating from the vendor is closely approximated by 4%, the sum of these three error rates. As noted earlier, the modeling process can incorporate either relative or actual quality measures. In this case, since actual measures are available, the quality of DUl is 0.96.

4. Input Time - The completion of the collection of patient information is used as the point of reference for this analysis and is consequently set equal to zero.

5. Age - The information is collected by the optician and optometrist over the period of an hour. Therefore, on the average, this information is one-half hour old at the completion of the collection process (t=0). Since the analysis is based on a ten hour day, one-half hour is represented as 0.05 day.

6. Volatility - At first it would appear that much of the information concerning the patient would be of rather low volatility. However, the volatility of the information in this analysis relates to how long the patient will wait before canceling the order (shelf life). After this point the information is useless. Most of the orders are express orders, which implies that receiving the glasses promptly is critical.

7. Shelf-life - It was determined that unless the laboratory received the patient information within five days it would become useless due to cancellation of the order.

8. Timeliness Function - Since cancellations accelerate near the end of the five day shelf-life, an exponent of less than one for the timeliness function is required. An exponent of 1/4 indicating that approximately 67% of the cancellations occurred during the fifth day proved satisfactory.

Input
Vendor
Cost
Quality
Input

Time
Age
Volatility
Shelf Life
Timeliness Function
DU1
VB1
$30.00
.960
0
.05 day
High
5 days
1/4
DU3
VB2
$.10
.990
0
2.5 days
Medium
60 days
1
DU6
VB3
$.2
1.00
0
10 days
Low
500 days
1

Figure 12(a): Inputs required for the primitive data units
Process
Cost
Time
Quality Function
Delay
PB1
$2.00
.01 day
Qout = Q1
.25 day
PB2
$4.00
.001 day
Qout = Q2 * Q3
.05 day
PB3
$1.00
.0002 day
Qout = Q4 * Q6
.0003 day
PB4
$4.00
.001 day
Qout = Q8
.5 day
(PB5)
$16.30
.3114 day
Qout = Q5
1 day
(PB6)
$22.60
.3126 day
Qout = Q8
2 days

Figure 12(b): Inputs required for the processing blocks
Storage Blocks
Cost
Time
Delay
SB1
$.10
.0002 day
0
SB2
$.24
.0002 day
0

Figure 12(c): Inputs required for the storage blocks
Quality Block
Cost
Time
Quality Function
Delay
QB1
$.10
.0002
Qout = [1 - [1 - Qin](.60)]
0
QB2
$.20
.0005
Qout = [1 - [1 - Qin](.67)]
0

Figure 12(d): Inputs required for the quality bloks
Output
Customer
V1
W
ß
DU9
CB1
1.00
.067
7.0
1.0
(DU12)
1.00
.067
7.0
1.0
(DU13)
CB1
1.00
.067
7.0
1.0
(DU14)
CB1
1.00
.067
7.0
1.0

Figure 12(e): Inputs required to evaluate the information product

Figure 12: Data for Optiserve Case

5.3.2. Processing Blocks

Four descriptors are required for each processing block. As an example we will discuss PB2.

1. Cost - The cost of the IBM query of the item master file to determine frame availability was estimated to be $4.00. This includes items such as personnel costs.

2. Time - The expected time required by this query is very small compared to the various delays in the system. Such times are represented in the model as 0.001 day which is an upper limit on the time requirement.

3. Quality Function - Unlike the previous example where the processing was based on aggregation and the corresponding quality function based on averaging of inputs, this process is based on a comparison. Consequently the output is of acceptable quality if and only if both the information provided by the vendor and the frame availability information are correct. The corresponding quality function is the product of the quality of DU2 and DU3 .

4. Delay - This processing by the IBM is done at one hour intervals, consequently the average delay is one-half hour or 0.05 day based on a 10 hour day.

5.3.3. Storage Block

For each storage block only three descriptors are required. SB1 will be used as an example.

1. Cost- The cost of storing master file data for the IBM is estimated to be $0.1, which includes the time to retrieve the data and disk storage cost.

2. Time- This is primarily the time to retrieve the record associated with a patient's frame, which is relatively short, and is estimated to be 0.0002 work day.

3. Delay - The frame availability information is available without delay once the query is received since the database is on-line.

5.3.4. Quality Blocks

Each Quality Block requires four descriptors. QB1 is used as an example.

1. Cost - When the rework operation is included as part of the quality block, the cost is a sum of the actual cost of checking the information quality plus a prorated cost to cover the rework operation for that proportion of the flow which is rejected. When, however, the rework flows are modeled separately, this is simply the cost of the quality check. For this case, the check is performed by the lBM at a cost of $0.10.

2. Time - A similar argument holds for the time estimate. Since in this case the flows are split and modeled separately, the time is quite small and 0.0002 days is used as an upper limit.

3. Quality Function - At this point approximately 5% of the flow is in error (4% from the flow provided by the patient and 1% from errors in the item master file concerning frame availability). The quality check at QB1 focuses only on frame related data, consequently only about 2% of the flow ( 1% relating to errors from the patient flow and 1% from the master file) is detected to be in error and rejected. The remaining 3% out of the initial 5% is used in the modeling of the output quality, Qout, for this block by using the parameter, 0.60, in the quality function (i.e. .03/.05, see Figure 12(d), QB1). The specific structure of the quality function is necessitated by the fact that Qout is the proportion that is good. Note that this structure is similar to that used in Figure 4(c).

4. Delay - Since the quality check is made by the IBM as the last step in processing, the delay = 0 for this quality block.

5.3.5 Information Required to Evaluate Information Products

Of the eight descriptors required to evaluate an information product, three are computed while the other five must be specified. In this case, the four products are simply subdivisions of a single flow to reflect the degree of rework which was required. While they will differ in terms of the calculated descriptors of cost and timeliness, they will be identical in terms of the five specified descriptors. Note that the effect of the rework process is to bring the flow's quality to that portion of the flow that is not undergoing rework.

1. Customer - The customer for the information product is the laboratory.

2. Intrinsic Value - VI was set to 1.00 so that it could be scaled up or down as a function of spectacle value. This permits flexibility for the analysis.

3. Weighting Factor - Since it was estimated that for laboratory operations quality was approximately twice as important as timeliness, the weight, w, was set equal to 0.67.

4. Data Quality Exponent - Since the value of the information product declines substantially for relatively small data quality problems, an exponent greater than one is indicated. For example, moving from .95 quality to, say .98 quality, although a small quality increase, would be a substantial improvement in the value of the information product. It was estimated that a probability of error of 0.10 would reduce the value of the information product by half, while if the error probability were 0.50 the information product would have no appreciable value. An exponent of a = 7.0 provides a good approximation to these conditions.

5. Timeliness Exponent - Since for this case the shelf-life used in the timeliness function was based on the patient's tolerance for delays, a linear function is used to convert timeliness to value to avoid "double-counting" this factor in the model.

As explained earlier, Figure 10 differs from Figure 9 in that the activities of the data quality block are modeled at a finer level of detail. The quality block QB1 identifies only certain problems (e.g., unavailability of the frame ordered). The corrective processing (or rework) activity for those tickets flagged by QB1 is represented by PB5. The processing time involves reworking the tickets all the way from PB1 and hence the cost for PB5 is obtained by summing the optician part of the cost of PB1 and the entire cost for PB2. (This estimate is conservative.) The processing time at PB2 is the sum of the times at the previous steps (PB1 and PB2) which must be repeated plus the delays incurred at these steps, that is 0.25 and 0.05 day respectively. On average it takes one day for the store to contact the customer and restart the process. Hence the delay value for PB5 is one day. Once corrective processing at PB5 is completed, the Rx ticket is handled as is any ticket passing through QB1 that is not flagged. From this it follows that Qout for PB5 is simply the quality of DU5, namely Q5. Analogous statements apply to PB6.

It should be noted that the only additional data values that must be obtained for Figure 10 are the values for PB5 and PB6. This follows from the fact that the corrected tickets are reprocessed from the beginning. Since QB1 and QB2 are independent, it is possible that a given ticket may have to be reworked twice, causing inordinate delay. Although this happens for only one in 5000 tickets (or on average once every two days), over the course of a year, a sizable number of patients would experience this delay, much to the detriment of Optiserve.

5.4 Data Requirements for the Minor and Major Modified Information Manufacturing Systems

Minor

The only difference between the current information manufacturing system and the one with the minor change is the addition of quality control activities right in the store. This change can be modeled by the placement of a quality control block after PB1. For this quality block we do not need to model separately the correction (or rework) process, for corrections can be made immediately while the optician is interacting with the patient. Thus time and cost values for this quality block are weighted averages of the values for the tickets that do not need corrective action and those that do.

The quality block would detect over half the errors resulting from patient-supplied data. Thus the Quality Function for this block can be approximated conservatively by Qout = [1 - [1 - Qin](0.50)]. This leaves a flow with an error of 2% at this point (1% from inability to grind the lenses and 1% from still undetected patient­based errors). The time required for this quality block is a weighted average for checking when no problems are present (0.0002 day) and when something needs to be corrected (0.03 day) which yields a value of 0.0008 day (0.0002*0.98+0.03*0.02). Similarly, the cost is computed as $0.198 ($0.10*0.98 + $5*0.02). The delay is 0. Because 50% of the initial errors are caught by this additional quality block, the quality functions for the old QB1 should be replaced with 0.67 (1/3 of the 3% remaining error is detected) and the factor 0.67 associated with QB2 should be replaced with 0.50 (1/2 of the 2% remaining error is detected). The fact that 1% error still remains is due to the system being unchanged except for early detection of some of the errors.

Major

The reengineered version of the information manufacturing system, shown in Figure 11, is designed to substantially reduce problems resulting from data quality and the time required to deliver the Rx ticket to the laboratories. This leads to improved service quality which in this case is the purpose of the reengineering. Accordingly, we focus on obtaining the quality and timeliness inputs rather than the cost inputs.

This reengineered system is coupled with an enhanced inventory management system that reduces errors in the inventory file (frame master file) from 1% to 0.1%. This implies the quality associated with DU2' is 0.999. (As mentioned earlier, the use of the prime notation indicates that the flows and activities have been redefined as part of the reengineering.) The direct entry of patient-based data into the PC coupled with various built-in quality checks improves the quality of DU1' to 0.98. Thus, the Qout for PB1' can be approximated by 0.979. (2% froom DU1' and 0.1% from DU2'). The block QB1' models quality control on the output of PB1'. For example, those cases for which lenses cannot be ground due to a mismatch of the frames selected and the prescription are handled by QB1'. Since such cases identified for corrective action by QB1' require interaction with the patient (who is still in the store), the time for QB1' is the weighted average of 98% of cases that do not require action and the 2% of cases that do. The former takes 0.017 days and the latter 0.05 days. The quality of DU5' is 0.998 (0.001 from undetected frame inventory errors and 0.001 from undetected patient-based errors). Since in the reengineered system the Rx ticket is immediately forwarded to PB2', which in turn sends it without delay to the laboratory, the time for PB2' is at most a few minutes, or approximately 0.005 day. For PB2', Qout = Qin.

One key issue in the implementation of the information manufacturing model is the ability to vary between macro and micro representations as the need arises. This section addressed this issue in the context of micro modeling of rework flows coupled with macro modeling of the rest of the system.

A second key issue is the ability to obtain and/or determine the inputs required by the various blocks and flows of the model. The Optiserve case illustrated some of the issues which would encountered in practice, and how they can be addressed. While only selected examples were given, similar analyses and computational procedures were used to obtain all required parameter values.

6. Summary and Conclusions

We have presented an information manufacturing model that can be used to determine the timeliness, quality, cost, and value of information products. The systems we model have a predefined set of data units which undergo predefined processing activities. Our work is customer driven in that the value of the information products manufactured by the system is determined by the customer of information products.

Our measure for timeliness provides a way to deal with issues related to currency, volatility, and customer requirements in a coherent manner. Given the age of the primitive input data units, their volatility as furnished to the analyst, and the time these data units spend in the system, we can determine the level of timeliness for both primitive data units and units derived from them. Moreover, our timeliness measure handles data whether they are numeric or non-numeric. We have also modeled the data quality measure as a place holder for the other dimensions of data quality that concern the analyst. Furthermore, we have addressed the implications of costs of information products and their value to data customers. As presented in the previous sections, many of the issues addressed are unique to the information manufacturing model.

Based on the model constructs, we developed a methodology and software tools to operationalize the analysis. The methodology includes a Information Manufacturing Analysis Matrix that tracks the data units through the various stages or steps of the information manufacturing process. Parameters contained in the matrix are used to compute measures such as timeliness, cost, and quality of information products manufactured by the system. An illustrative example has also been presented to illuminate how this methodology can be used to analyze and improve information manufacturing systems. Moreover, the Optiserve case highlights the issues that need to be addressed in applying this methodology in an actual setting.

The Optiserve case focused on a relatively small scale information manufacturing system. For large scale systems, which may contain hundreds of processes, data units, and so forth, a hierarchical modeling approach is required. Under this approach an analyst would model, initially, at a higher (macro) level with each block possibly representing a large number of related activities. That macro model, which would contain a relatively small number of blocks, is then analyzed. Those blocks that for whatever reason require more specific analysis are then replaced with a detailed (micro) model. In fact, this approach was used in the Optiserve case as the activities of the data quality blocks displayed in Figure 9 had to be modeled in greater detail as shown in Figure 10.

One of the benefits of the Information Manufacturing Model is the ability to use it to study the impact on an information system of a changed environment and the efficacy of various options for addressing these changes. For example, suppose that governmental regulations alter the frequency with which information products are required. Proposed changes to the current sytem can be simulated to verify if these alterations can, in fact, enable the information product to be delivered when required. It also provides insights regarding the information quality that would result together with the associated costs.

This research is particularly timely in light of the current National Information Infrastructure initiative, the industrial trend toward total quality management and business process reengineering, and the technological advances in client-server computing and systems integration. At the intersection of these driving forces is information quality. Ultimately, it is high-quality information products that we need to deliver to the right customer at the right location in a timely and cost-effective manner. This research has been designed to help accomplish this goal.

7. References

[1] Ahituv, N. (1980). A Systematic Approach Toward Assessing the Value of an Information System. MIS Quarterly, 4(4), 61-75.

[2] Bailey, R. (1983). Human Error in Computer Systems. Englewood Cliffs: Prentice-Hall, Inc.

[3] Ballou, D. P. & Pazer, H. L. (1982). The Impact of Inspector Fallibility on the Inspection Policy in Serial Production System. Management Science, 28(4), 387-399.

[4] Ballou, D. P. & Pazer, H. L. (1985). Modeling Data and Process Quality in Multi-input, Multi-output Information Systems. Management Science, 31(2), 150-162.

[5] Ballou, D. P. & Pazer, H. L. (1990). A Framework for the Analysis of Error in Conjunctive, Multi-Criteria, Satisficing Decision Processes. The Journal of the Decision Sciences Institute, 21(4), 752-770.

[6] Ballou, D. P. & Tayi, K. G. (1989). Methodology for Allocating Resources for Data Quality Enhancement. Communications of the ACM, 32(3), 320-329.

[7] Bernard, D., Emery, J., Nolan, R. L., & Scott, R. H. (1977). Charging for Computer Services: Principles and Guidelines. New York: Petrocelli.

[8] Buzacott, J. A. & Shanthikumar, J. G. (1993). Stochastic Models of Manufacturing Systems. Englewood: Prentice Hall.

[9] Chengalur-Smith, I., Ballou, D., & Pazer, H. (1992). Dynamically Determine Optimal Inspection Strategies for Serial Production Processes. International Journal of Production Researcher, 30(1), 169-187.

[10] Crosby, P. B. (1979). Quality is Free. New York: McGraw-Hill.

[11] Cushing, B. E. (1974). A Mathematical Approach to the Analysis and Design of Internal Control Systems. Accounting Review, 49(1), 24-41.

[12] Delone, W. H. & McLean, E. R. (1992). Information Systems Success: The Quest for the Dependent Variable. Information Systems Research, 3(1), 60-95.

[13] Deming, E. W. (1986). Out of the Crisis. Cambridge: Center for Advanced Engineering Study, Massachusetts Institute of Technology.

[14] Figenbaum, A. V. (1991). Total Quality Control. New York: McGraw-Hill.

[15] Firth, C. & Wang, R. Y. (1993). Closing the Data Quality Gap: Using ISO900 to Survey Data Quality Research. TDQM-93-03, The Total Data Quality Management (TDQM) Research Program, MIT Sloan School of Management.

[16] Garvin, D. A. (1988). Managing Quality-The Strategic and Competitive Edge. New York: The Free Press.

[17] Hammer, M. (1990). Reengineering Work: Don't Automate, Obliterate. Harvard Business Review, 90(4), 104-112.

[18] Hilton, R. W. (1979). The Determinants of Cost Information Value: An Illustrative Analysis. Journal of Accounting Research, 17(2), 411-435.

[19] Johnson, J. R., Leitch, R. A., & Neter, J. (1981). Characteristics of Errors in Accounts Receivable and Inventory Audits. Accounting Review, 56(2), 270-293.

[20] Juran, J. M. (1989). Juran on Leadership for Quality: An Executive Handbook. New York: The Free Press.

[21] Kleijnen, J. P. C. (1980). Computers and Profits: Quantifying Financial Benefits of Information. Readings, MA: Addison Wesley.

[22] Kraimer, K. L., Dutton, W. H., & Northrup, A. (1981). The Management of Information Systems. New York: Columbia University Press.

[23] Kriebel, C. H. & Mikhail, O. (1980). Dynamic Pricing of Resources in Computer Networks. Logistics.

[24] Lai, S. G. (1993). Data Quality Case Study - "Optiserv Limited". Master Thesis, MIT Sloan School of Management.

[25] Laudon, K. C. (1986). Data Quality and Due Process in Large Interorganizational Record Systems. Communications of the ACM, 29(1), 4-11.

[26] Liepins, G. E. & Uppuluri, V. R. R. (Ed.). (1990). Data Quality Control: Theory and Pragmatics. New York: Marcel Dekker, Inc.

[27] Martin, J. (1973). Security, Accuracy, and Privacy in Computer Systems. Englewood Cliffs: Prentice Hall.

[28] Morey, R. C. (1982). Estimating and Improving the Quality of Information in the MIS. Communications of the ACM, 25(5), 337-342.

[29] Pak, S. & Pando, A. (1993). DQA: A Software Tools for Analyzing Data Quality in Data Manufacturing Systems. TDQM-93-10, The Total Data Quality Management (TDQM) Research Program, MIT Sloan School of Management.

[30] Redman, T. C. (1992). Data Quality: Management and Technology. New York: Bantam Books.

[31] Shewhart, W. A. (1931). Economic Control of Quality of Manufactured Products. New York City: Van Nostrand.

[32] Taguchi, G. (1979). Introduction to Off-line Quality Control. Magaya, Japan: Central Japan Quality Control Association.

[33] Tansel, A. U., et al. (1993). Temporal Databases: Theory, Design, and Implementation. Redwood City: The Benjamin/Cummings Publishing Company, Inc.

[34] Wang, R. Y. & Firth, C. (1993). Using a Flow Model to Analyze the Business Impact of Data Quality. (No. TDQM-93-08). Total Data Quality Management (TDQM) Research Program, MIT Sloan School of Management.

[35] Wang, R. Y. & Kon, H. B. (1993). Towards Total Data Quality Management (TDQM). In Information Technology in Action: Trends and Perspectives. (pp. 179-197). Englewood Cliffs, NJ: Prentice Hall.

[36] Wang, R. Y., Kon, H. B., & Madnick, S. E. (1993). Data Quality Requirements Analysis and Modeling. In the Proceedings of the 9th International Conference on Data Engineering, (pp. 670-677) Vienna: IEEE Computer Society Press.

[37] Wang, R. Y., Reddy, M. P., & Kon, H. B. (1992). Toward Quality Data: An Attribute-based Approach. To appear in the Journal of Decision Support Systems (DSS).