您的当前位置：首页 The Use of Simulated Experts in Evaluating Knowledge Acquisition

The Use of Simulated Experts in Evaluating Knowledge Acquisition

来源：爱站旅游

P. Compton, P. Preston, B. Kang

School of Computer Science and Engineering

University of New South Wales

Sydney 205, Australia

email compton@spectrum.cs.unsw.oz.au

Abstract

Evaluation of knowledge acquisition methods remains an important goal; however,evaluation of actual knowledge acquisition is difficult because of the unavailability ofexperts for adequately controlled studies. This paper proposes the use of simulatedexperts, i.e., other knowledge based systems as sources of expertise in assessingknowledge acquisition tools. A simulated expert is not as creative or wise as a humanexpert, but it readily allows for controlled experiments. This method has been used toassess a knowledge acquisition methodology, Ripple Down Rules at various levels ofexpertise and shows that redundancy is not a major problem with RDR.

Introduction

Evaluation of knowledge acquisition (KA) methods remains an important goal. Many KA methodshave been proposed and many tools have been developed. However, the critical issue for anydeveloper of knowledge based systems (KBS) is to select the best KA technique for the task inhand. This means that papers describing methods need to provide convincing evidence of theparticular advantage of the method over other methods and clear identification of the problems andweaknesses of a method. Unless this clear evidence is provided it is very hard to be sure whetheror not to believe the author, who with the best will in the world, is mainly concerned to highlightthe advantages he or she believes are provided by the new method they are proposing. As anexample, it seems there are still very few case studies of maintenance problems with KBS, e.g.(Bachant and McDermott 1984; Compton, Horn et al. 19) . The problem for KA researchers isthat they need to demonstrate results on actual KA from experts. Obviously it is expensive to useexperts for other than real applications and they are not readily available for controlled studies. Theclosest to a controlled scientific evaluation of KA so far seems to be Shaw's study of differentexperts using KSSO (Shaw 1988) . However the aim of this study was to investigate variability inhow experts provide knowledge rather than to evaluate a KA method. The study suggested thatexperts organised their knowledge of the same domain quite differently from each other and thesame expert was likely to vary his or her knowledge organisation on repeat experiments. Clearly any study using experts needs to take into account the variability between experts as well asthe difficulty of repeat experiments on the same experts, whereby they become more expert atcontributing to a KBS. These are standard problems in empirical science, but are major stumblingblocks in KA because of the cost and unavailability of experts. Experts by definition are peoplewhose expertise is scarce and valuable.

One solution to this problem is to pick tasks for which many people have significant expertise sothat experts are readily available. Little work appears to have been done on this approach and inour own experience it is very difficult to identify suitable tasks. Another approach is simply toreport on how methods have been used on a wide range of real world systems. This is useful butless than ideal as it only applies to established methods not new research and it is very difficulty to

quantify and compare. New methods are difficult as one must first convince an organisation of theadvantages of using a hitherto untested approach. It normally happens because the developer ispart of the organisation, which hardly provides a good controlled environment. However, clearlysome standards on how application work should be reported to make comparison possible isdesirable.

The major attempts to evaluate KA to date are the Sisyphus projects (Linster 1992; Gaines andMusen 1994) . In these projects a sufficient paper specification of a problem has been provided sothat a KBS solution could be implemented without further information being required. Thesestudies have been very valuable because they have resulted in papers describing the development ofa variety of solutions to the same problem. The basis for comparison has been informal but veryinteresting, even resulting in joint papers where authors contrast their methods (Fensel and Poeck1994) However, these papers necessarily have no information on actual KA. All the relevantknowledge was already in a paper specification. What the studies are concerned with is identifyingand perhaps building problem solving methods suitable for the described problem, and developingan appropriate domain model and perhaps even a KA tool suited to the problem. They are notconcerned with the further process of actually acquiring the knowledge to go into the knowledgebase. For a system fully specified on paper in a single small document, acquisition of theknowledge is trivial once the problem solving method and domain model have been developed.However, this does not seem to be the case with real KBS projects.

This paper proposes the notion of using another KBS as a simulated expert from which knowledgecan be acquired. The KA method or tool is used to acquire knowledge from the simulated expertand to build a new KBS which should have the same competence as the KBS from which thesimulated expert is derived. Instead of asking a human expert what the reasons are for reaching aparticular conclusion, etc., one asks the simulated expert whose source of expertise is a previouslybuilt expert system for the domain. The obvious advantage of such an approach is that endlessrepeat experiments are possible and the experimenter has complete control over all the variables. A weakness of this approach is that a data model is already given or will be very easily derived,whereas with a human expert this may be more difficult. We mean here the data model requiredfor communicating with the user and/or acquiring the data about a particular case to be processed.We are not concerned with further abstraction that may appear attractive and may be usefulinternally in the KBS. Perhaps a new data model will be developed, but there is an alreadyimplemented model and the chances are that the new system will use an identical model. However,this does not seem a major lack as the development of an appropriate data model is itself a majorconcern of conventional software engineering and knowledge engineering seems to offer little tothis except that the development of the data model is integrated into the overall knowledgeengineering process.

The interesting question of deciding on a problem solving method still remains. The simulatedexpert KBS of course has a specific problem solving method, but this is not necessarily apparent,nor need it be reproduced in the new KBS. The key issue in using the simulated expert, is whattype of knowledge it provides. The knowledge provided by the simulated expert essentially comesfrom its explanation component. The explanation component may provide a way of browsing thesystem or it may provide explanations that differ from its reasoning in reaching a conclusion, butmost likely it is going to provide some sort of activation or rule trace. This further implies a set ofcases to exercise the simulated expert KBS. However, if such a KBS exists, the chances aresuitable cases can be made available. Because of the likely use of cases to exercise the simulatedexpert, this approach relates to machine learning evaluation. In machine learning evaluationextensive use has been made of date bases of cases. The performance of different methods hasbeen able to be compared by their performance on learning from these databases. Some of thesedatabases are used in the studies described below. The crucial difference between KA and ML

evaluation is that ML uses the raw data of the cases to derive a KBS, whereas for KA evaluationthe simulated expert's explanation of its conclusion is used to build the new KBS. ML isconcerned with identifying important features from data. KA is concerned with organisingknowledge about the important features provided by the expert. Clearly different styles of KBSand different explanation facilities are going to provide quite different evaluations of differentstrengths and weaknesses of various KA systems. Also, the evaluation may use some or all ofthe knowledge provided by the simulated expert to provide different levels of expertise.

The major weakness apparent in this approach to evaluation is that the simulated expert has nometa-knowledge. It can't report that it thinks it has told the knowledge engineer everything that isimportant and can't reorganise its knowledge presentation to suit the desires of the knowledgeengineer etc. However, these are also, at least partially, weakness of human experts, so it isprobably reasonable to use a simulated expert that has no ability in this regard. However, KAmethods that rely heavily on the meta-knowledge abilities of the expert will have problems withthis approach to evaluation.

Experimental Studies

Aim

This paper describes the application of a simulated expert to evaluating the Ripple Down Rule(RDR) methodology. A frequent question raised with respect to RDR is the level of redundancy inthe KBS and the importance of the order in which cases are presented. This question is exploredwith respect to three different domains and three different simulated expert KBS for each domain,three different levels of expertise and a number of different orderings of the data presented. Thethree different domains are the Tic-Tac-Toe and Chess End Game problems from the UC Irvinedata repository and the Garvan thyroid diagnosis problem, also included in the Irvine repository,but here based on a larger data set. The KBSs used for the simulated expert were built byinduction from the same data sets using the C4.5, Induct and the RDR version of Induct machinelearning algorithms.

IntroductionRipple Down Rules (RDR) RDR is a KA methodology and a way of structuring knowledge bases which grew out of long termexperience of maintaining an expert system (Compton, Horn et al. 19) . What became clearfrom this maintenance experience is that when an expert is asked how they reached a particularconclusion they do not and cannot explain how they reached their conclusion. Rather they justifythat the conclusion is correct and this justification depends on the context in which it is provided(Compton and Jansen 1990) . The justification will vary depending on whether the expert is tryingto justify their conclusion to a fellow expert, a trainee, layperson or knowledge engineer etc. Thisviewpoint on knowledge has much in common with situated cognition critiques of artificialintelligence and expert systems but here leads to a situated approach to KA.

The RDR approach was developed with the aim of using the knowledge an expert provided only inthe context within which it was provided. For rule based systems it was assumed that the contextwas the sequence of rules which had been evaluated to give a certain conclusion. If the expertdisagreed with this conclusion and wished to change the knowledge base so that a differentconclusion was reached, knowledge was added in the form of a new rule of whatever generalitythe expert required, but this rule was only evaluated after the same rules were evaluated with thesame outcomes as before. With this approach rules are never removed or corrected, only added.All rules provide a conclusion, but the final output of the system comes from the last rule that wassatisfied by the data.

Initial experiments with this approach were based on rebuilding GARVAN-ES1, an early medicalexpert system (Horn, Compton et al. 1985; Compton, Horn et al. 19) . This system was largelyrebuilt as a RDR system and it was demonstrated that rule addition of the order of 20 per hourcould be achieved and with very low error rates(Compton and Jansen 1990) . It was realised thatthe error rate could be eliminated by validating the rules as they were added (Compton and Preston1990) . A valid rule is one that will correctly interpret the case for which it is added and notmisinterpret any other cases which the system can already interpret correctly. One possibility is tostore all the cases seen or their exemplars (an approach used by Gaines in his version of RDRs(Gaines 1991a) ) and check that none of these are misinterpreted. The approach on which mostRDR work has been based is to check that none of the cases that prompted the addition of otherrules are misinterpreted. These cases are stored - they are the \"cornerstone cases\" that maintain thecontext of the knowledge base. In fact only one of these cases has to be checked at one time, thecase associated with the rule that gave the wrong classification. To ensure a valid rule, the expertis allowed to choose any conjunction of conditions that are true for the new case as long at leastone these conditions differentiates the new case from the cornerstone case that could bemisclassified. To ensure this the expert is shown a difference list to choose from (Compton andPreston 1990) , a list of differences between the current case and the case attached to the last truerule.

RDR have also been used for the PEIRS system described below (Edwards, Compton et al. 1993). They have also been used for a configuration task, but in this case a number of RDR knowledgebases were built inductively and a further algorithm was developed to reason across the variousknowledge bases(Mulholland, Preston et al. 1993) . The version of RDR was a simple Cimplementation running under Unix.

Data Sets The following data sets were used. Chess and Tic Tac Toe are from the University of CaliforniaIrvine Data Repository. The Garvan data set comes from the Garvan Institute of MedicalResearch, SydneyChessTicTacToe

Chess End-Game -- King+Rook versus King+Pawn on a7. 36 attributes for 3196cases and 2 classifications.

Tic-Tac-Toe Endgame database. This database encodes the complete set of possibleboard configurations at the end of tic-tac-toe games. 9 attributes for 958 cases and 2classifications.

Thyroid function tests. A large set of data from patient tests relating to thyroidfunction tests. These case were run through the Garvan-ES1 expert system (Horn,Compton et al. 1985) to provide consistent classifications. The goal of any newsystem would be to reproduce the same classification for the cases. 32 attributes for21822 cases and 60 different classifications. These are part of a larger data set of45000 cases covering 10 years. The cases chosen here are from a period when thedata profiles did not appear to be changing over time and could be reasonablyreordered randomly (Gaines and Compton 1994) . The Garvan data in the Irvinedata repository is a smaller subset of the same data. The Garvan data consists largelyof real numbers representing laboratory results. Using the Garvan-ES1 pre-processor these were reduced to categories of high, low etc as used in the rules in theactual knowledge base. The preprocessed data was used in the studies below.

Garvan

Machine Learning Methods C4.5 (Quinlan 1992) is a well established machine learning program based on the ID3 algorithm.The extensions to the original ID3 are that it deals with missing data, real numbers, providespruning and allows the KBS to be represented as a tree or rules, with some consequentsimplification. The version of C4.5 used was provided by Ross Quinlan. It was used with thedefault settings, as the aim was not produce the best possible KBS but a reasonable simulatedexpert. There were no real numbers in the data but a lot of missing data in the Garvan data set.Induct (Gaines 19) is based on Cendrowska's Prism algorithm (Cendrowska 1987) . Inductcan produce either flat rules or RDRs (Gaines 1991a) . Both versions of Induct were used. Theversions used were provided by Brian Gaines as part of the KSSn system. The RDRrepresentation is generally more compact (Gaines and Compton 1992) . Induct does not handlereal numbers at this stage, but deals with missing data and provides pruning. No pruning wasused in this study.

Although C4.5 and Induct both perform similarly there are important differences in theirunderlying algorithms. C4.5 attempts to find an attribute to go at the top of the decision tree whosevalues best separate the various classifications as assessed by the information calculation used.This separation is estimated as the best overall separation, so that there is no requirement that anyleaf should contain only one class or a particular class. This process is repeated with the cases ateach leaf. In contrast, Induct selects the most common classification and attempts to find anattribute value pair that provides the best selector for cases with this classification. Furtherattributes value pairs are added to the rule to improve the selection as long as this is statisticallywarranted. The process is repeated for the remaining cases. The difference with the RDR versionof Induct is that it repeats the process separately for cases that are incorrectly selected by the ruleand those that the rule does not select resulting in an RDR tree.

Experimental Method

An RDR KBS is built by correcting errors, by adding new rules for cases which have not beengiven the correct classification. To do this the expert selects relevant conditions from the differencelist for that case. The method used here is identical except that any expertise used in selectingimportant conditions from the difference list is provided from the rule trace from another expertprocessing the same case. It should not be expected that the simulation will perform better than areal expert or the machine learning techniques on which it rests. The best that could hope to beachieved is defined by the accuracy of the simulated expert (this essentially becomes a measure ofthe performance of base machine learning technique). Real experts do however perform better thanmachine learning techniques when there are small data sets (Mansuri, Compton et al. 1991) , andin general a little knowledge can replace a lot of cases for machine learning (Gaines 1991b) .The following steps are required:Preparation

• collect a set of cases, and produce a knowledge base using machine learning mechanism- this becomes the basis for the Simulated Expert (SE) (described more fully later).• Randomise the data to avoid 'lumpiness'

• Start with fresh RDR Expert System (ES), essentially an empty knowledge base.

Processing

Step 1• get the next case

Step 2• ask the (simulated) expert for a conclusionStep 3• ask the ES under construction for a conclusionStep 4• if they agree, get another case and go back to Step 1

Step 5• if the Expert and the ES disagree, make a new (valid) rule and go back to 1.

Step 5 is the crux. The new rule needs to be constructed and located in the KB. For a RDR

simulation, this step consists of:

Step 5• Run the case against the ML generated KBS and produce a rule trace. This is essentially

the justification for the conclusion.

• Run the case on the developing RDR ES and identify the last rule satisfied and the lastrule evaluated. The new rule will be attached to the true or false branch of the last ruleevaluated according to the evaluation. That is, the new rule will be reached only if exactlythe same conditions are satisfied.

• Obtain the difference list based on the current case and the case attached to the last truerule. This difference list provides a filter to select valid conditions that can be used in anew rule.

• Using a combination of the expert's justification (the ML KBS rule trace; i.e. the rulessatisfied in reaching a conclusion) and the difference list create a new rule and attach it .The level of expertise of the simulated expert can be varied here by the mix of conditionsselected

Analysis

• An examination of the simulation results, and consideration of various metrics thatsupport or do not support the aim of the simulation.

The Simulated Expert The human expert using an RDR system selects conditions from the difference list to go into a newrule. The simulated expert is simply the mechanism for similarly selecting conditions from thedifference list to go in a new rule. The conclusion of the new rule is the conclusion specified in thedatabase for that particular case. Note that if no rule is satisfied the difference list includes all thefeatures of the present case. Three levels of expertise were used in this study.

Smartest (S1)

choosing four conditions from the intersection of the ML KBS rule trace for thecase and the difference list for the case. These conditions were selected from thetop of the list rather than randomly. If the top conditions were less important anda correction had to be applied, the correction difference list would cover the moreimportant conditions lower in the original list. If the intersection contains lessthan four conditions all the conditions it contains are selected. If it is empty fourconditions are chosen form the difference list. In fact selecting four conditionsgives the entire intersection in nearly all cases (Fig 1)

choosing a single condition from the intersection of the ML KBS rule trace forthe case and the difference list for the case. If the intersection is empty oneconditions is chosen from the difference list.

choosing all conditions from the difference list without reference to the ML KBS

Smart (S2)

Dumb (D)

The ML KBS referred to is a KBS built by using the entire data set and one of the machine learningalgorithms. The entire data set is used to ensure complete expertise, but which is then weakenedby the selection of conditions as above.

Obviously there are many other ways in which the expertise can be varied; however, these providea crude separation of three levels of expertise. The third is trivial and ensures that there will be arule for every different case profile in the database. Figs 1 and 3-5 indicate that the discrepancybetween the smartest and the smart expert is not as great as one might expect. Firstly there arefrequently zero conditions in the intersection of the difference list and the ML KBS rule trace. Inthis case conditions are selected from the difference list. Secondly there are hardly ever fourconditions in the intersection. In this case the entire intersection is used. These problems areworst with the Garvan data set (Figs 1 and 4) whereas with the other data sets there is a greaterdifference between the smartest and the smart expert. A reason for zero conditions in theintersection of the difference list and the ML rule trace can be an empty rule trace. A case that ismisclassified by the RDR KBS may actually have the default classification so that there is no ruletrace from the ML KBS. Note that conditions selected from the difference list are guaranteed toproduce a valid rule that will correctly classify the present case but not misclassify othercornerstone cases; this is a strength of RDR. However, the rule though valid may be fairly stupid.Where no rules are satisfied in the RDR KB and the difference list contains the entire case, theselection of conditions is more arbitrary but still determined by the ML KBS rule trace. Where thecase has the default classification so that there is no ML KBS rule trace, the selection of conditionsis very arbitrary, particularly since conditions are always selected from the top of the list.The rule traces used come from three KBSs developed by three machine learning systems•••

an Induct/RDR knowledge base (IR)

Induct knowledge base (II) (a series of conventional independent rules without linkages asin RDR.)

a C4.5 knowledge base (C4)

200

100

Number of Conditons

The number of conditions in an Induct/RDR rule tracefor cases from a sample of the Garvan data set

Number of Conditions

Figs 3-5 show the performance of the various RDR KBS built, one figure for each data set. Thegraphs show the increasing size and the improving performance as more and more of the trainingdata is tested on the developing RDR systems. The three horizontal pairs of graphs on each figureshow the results for the three different ML methods used to build the simulated experts. Theshaded sections on the graphs indicate the range of results for the randomised data for the smartestand smart experts. Since the stupid expert KBS is independent of the ML method used, it isshown only once in the top panel for each data set and results are shown for only onerandomisation. The other single line on each of the graphs represents the performance of the MLmethod when also given more and more of the training data. Only a single line is shown forclarity, but the full randomised data for the ML methods are given in Fig 2.

In terms of size, the Induct/RDR ML KBS are smaller than the smartest and smart RDR systemsfor all three data sets. However, for straight Induct and C4.5 the RDR systems are as small as theML systems except for the Tictactoe domain where the Induct KB is clearly smaller that the RDRKBs. These are fairly surprising findings. It is frequently commented that RDR are likely toproduce large badly organised KBS because there is no control of the order in which rules areadded and since rules are used only in context the same rules may have to be reproduced in manyplaces in the KBS. On the other hand it is generally assumed that ML methods will produce verycompact if not optimally compact KBs, in fact the point of induction is to produce a compactrepresentation. These results are therefore strongly supportive of the idea that RDR systems aresuitable for practical applications and that repetition in the KB is a comparatively minor problem.On the other hand the results also show that lesser expertise results in larger knowledge bases,with presumably more corrections of less appropriate rules. However, there is no doubt that thereis repetition because the ML Induct/RDR KBs are smaller than the manual RDR KBs. Howeverthe difference in size is the same as between Induct/RDR and the other ML methods.

The error results show that in general the RDR KBS have comparable errors to the ML KBSexcept for Induct where the RDR systems have less errors. However from Fig 2 it is not so muchthat the RDR system have less errors than Induct, but that Induct has greater errors than the otherML methods. However, as noted before, this may well be due to less than optimal use of Induct.There is also a tendency for the RDR systems to outperform induction when there are only smallnumbers of training cases as ML methods have difficulty with small training sets. This is mostnoticeable in the Tictactoe data, presumably because this dataset has the smallest number of cases.It is least noticeable with the Garvan dataset with the largest number of cases. Note that the MLKBS used for the simulated experts have been built using the entire data sets and so should be ableto provide greater expertise to the RDR systems than attempting to apply ML to a small number ofcases. These results are consistent with earlier findings that human experts outperformed ML withsmall amounts of data (Mansuri, Compton et al. 1991) . This earlier study showed a much greateroutperformance of the human expert building RDR (12% errors) to the ML method (ID3) (72%errors) after training on 291 cases. This discrepancy can be explained because these cases coveredall 60 classifications (Garvan data) resulting in very small numbers of training cases for each class.A human expert should perform better than the simulated experts because of the problems alludedto earlier of the intersection between the difference list and the ML rule trace often being empty andconsequent inappropriate selection of conditions from the top of the difference list, often thecomplete case data. This is a far cry from true human expertise.

100

Chess GrowthChess Errors4080

3060

2040

1020

0200

Tictactoe GrowthNumber of RulesTictactoe Errors08060100402001000

Garvan GrowthGarvan Errors030800

600

400

200

Percentage of Total Cases

InductC4.5Percentage of Total CasesInduct/RDRFig 2. Knowledge Base size and accuracy for ML KBS

200Induct/RDR GrowthInduct/RDR Errors50

15040

10020

5010

0200Induct GrowthNumber of RulesInduct Errors0301502010010500200C4.5 GrowthC4.5 Errors030

15020

10010

5000204060800204060800

Percentage of Total Cases

Most Expertise Medium Expertise Least Expertise InductionFig 3. Knowledge base size and accuracy for KBS built for the Chess domain

1500Induct/RDRGrowthInduct/RDR Errors2520100015500105

015000

Induct GrowthNumber of RulesInduct Errors2520100015105005

015000

C4.5 GrowthC4.5 Errors2520100015105005

00204060800204060800

Percentage of Total CasesPercentage of Total Cases

Most Expertise Medium Expertise Least Expertise InductionFig 4. Knowledge base size and accuracy for KBS built for the Garvan domain

200Induct/RDR GrowthInduct/RDR Errors60

15040

10020

500200Induct GrowthNumber of RulesInduct Errors060

15040

10020

500200C4.5 GrowthC4.5 Errors06015040100205000200

Percentage of Total Cases

406080020Percentage of Total Cases

406080Most Expertise Medium Expertise Least Expertise InductionFig 5. Knowledge base size and accuracy for KBS built for the Tictactoe domain

Discussion

RDR Evaluation

These results demonstrate that manual RDR expert systems produce a KB of similar size andperformance to inductively built expert systems. Since inductively built KBs are expected to bereasonably compact there is no evidence to suggest that RDR have a fundamental flaw in therepetition and disorganisation of the KB they might produce. The exception to this is that manualRDR systems are from 2.5 to 4 times as large as the inductively built Induct/RDR KBs. Even ifthis were the experience with human experts we suggest that this would be acceptable to theexperts because of the ease of building RDR systems and certainly the results show that theperformance of these systems are acceptable. Now the results also show that the size of the RDRknowledge base depends on the level of expertise, so that we suggest that in fact when genuinehuman experts are used with better expertise that smaller knowledge bases will result. Certainlythere have been no complaints from the expert building PEIRS that after 2000 rules he feels that heis adding the same knowledge repeatedly. Note that the PEIRS expert spends about 15 minutesper day adding rules so that even if there is some repetition it is unlikely to be seen as a significantproblem.

The critical issue is why this simple approach of constantly patching errors produces reasonablycompact knowledge bases and that greater expertise produces smaller knowledge bases. Theanswer seems to lie in the difference between trying to make a rule and trying to build a decisiontree. With RDR the expert, human or simulated, builds rules. Each addition to the knowledgebase is a rule to correctly classify the case in hand and any other cases like it. Ideally, aclassification rule will cover all the members of the class but none of the members of other classes.Repetition in the KB will occur where this goal is not achieved and there are false positives andnegatives to be handled by further rules. False positives (in this context) are where a rule is toogeneral and some of the cases that fall under it have to be corrected by further rules (which alsooccur elsewhere in the KB). False negatives (in this context), will occur if the expert makes rulesthat are too specific so that more rules have to be added (which themselves subsume earlier rules). Although the rule provided by the expert can be viewed as being added to a binary tree, RDRshould not be confused with trying to build a decision tree. Normally with a decision tree, the topnodes do not immediately reach a classification and it is assumed that the classification will beproduced after further splitting at lower nodes. It is generally (but not necessarily) assumed thateach node will refer to a single attribute and the branches correspond to each value of the attribute.In this framework the choice of which decisions go at the top is critical and algorithms like ID3attempt to find the best attribute. Another way of looking at this is that when a single attributedecision tree node is selected there is no way of controlling false positives till further nodes areadded. The values for the attribute split the population and that is that. In contrast, with RDR,each rule attempts to deal with a particular class, so each rule can be selected to limit both falsepositives and false negatives. In particular, further conditions can be added to the rule to limit falsepositives. If we assume that each rule is perfect and there are no false positives or negatives, itdoes not matter whether the initial rules deal with classes with many members or few. If a rule(using attribute value pair conditions) is specific to the class, then cases for other classes will fail tosatisfy the rule and will result in new rules being created. The order doesn't matter as long as therules have no false positives. As false positives are introduced the order in which cases occur andrules are added become more critical. A rule with irrelevant conditions will have a lot of falsepositives. This suggests that the higher the level of expertise the less important to the final size isthe order in which rules are added .

For an expert to build a decision tree he or she must think about the whole domain in selecting anattribute to go at the top of the tree. With RDR the expert is asked simply to justify theirconclusion in the context in the same way as normal human discourse. An expert is someone whoknows the right conclusion and is likely to provide a justification that is both as general and as

error free as possible in the context; i.e. it implicitly minimises the false positives and falsenegatives in the context; i.e. it is a good practical explanation. The simulated expert using the MLrule trace also does this to some degree. The RDR approach thus provides a framework where thenatural performance of an expert will tend to produce a compact KB. The experts response willalso probably encourage the development of the KBS in a way that is appropriate for the domain.In a domain where false positives are problematic the expert will tend to give more specific rulesresulting in gradual but very accurate coverage of the domain. Where false negatives are notdesirable the expert will tend to make more general rules resulting in the need for greatercorrection.

The results for PIERS suggest human experts tend to produce rules that are very general, and sowould produce few false negatives while at the same time producing few false positives needingfurther correction (Compton, Edwards et al. 1991) . The rules tend to have only one or twoconditions in them and on average only two to three rules have to be satisfied to reach the correctconclusion. The resulting structure is closer to a long decision list with each rule having a furtherdecision list refinement than to a tree. The depth of decision list corrections is only two or threebut the length of the decision lists is about 50.

Of course some repetition does occur as shown in the results This repetition can be dealt with byGaines strategy (Gaines 1991a) as discussed. However, reorganisation of the KB is likely to be asecondary or rare requirement, with the results here not indicating any pressing need to reorganisethe KB.

Since RDR have been compared here to induction, the question arises of why not simply useinduction. For induction all the training cases need to be well classified by an expert prior to useand in many domains such cases are not available. For RDR the cases only need to be wellclassified by the expert when they are used to add a rule to a system. Hence, a good way ofpreparing cases for induction would be to build an RDR system which produces a version of therequired KB in the process. As the PIERS experience shows this can occur while the developingKB is in routine use.

A conclusion from this work is that the effort that is put into trying to organise knowledge into anoptimal model is often unnecessary. It is perfectly adequate merely to keep patching errors withlocal corrections. Not only is this practical, but the task does not require a knowledge engineer orknowledge engineering skills, thus transforming the practical possibilities of KBSs.

Finally, this is seen as a counter intuitive result, probably because of the underlying Platonicmotivation in much AI of finding the right model, the right representation or the right knowledge .Something that works and is simple but is not elegant and demanding is perhaps not seen asattractive. The situated cognition perspective on the other hand suggests that since the knowledgeexperts construct is always going to be justification constructed on the fly in context, a local patchapproach is an appropriate solution.

The critical question for the RDR approach is whether a local patch approach can be found fortasks other than single classification. A solution has been found for multiple classificationproblems (Kang, Compton et al. 1994; Kang, Compton et al. 1995) and the size of the knowledgeacquisition task is similar to ordinary RDR. It remains a (hopeful) conjecture as to whether thiscan be used as a basis for other tasks, removing the need for task analysis (Compton, Kang et al.1993) . More radically it has been suggested that testing and repairing KBs is a more importantand more fundamental issue in developing KBs than analysing the tasks to be carried out (Menziesand Compton 1995) .

Simulated Experts

Most KA research at present is concerned with modelling the problem solving activity required fora particular task and the task domain. The KA method may provide a framework in which this isapproached or it might provide a way of developing KA tools specific to the problem and this inturn may be carried out by building a tool de novo or by assembling a tool from components ormodifying existing tools. Evaluation of these methods, such as in the Sisyphus studies, haveprimarily focussed on trying to make clear the different ways in which these tools and methodswork and their relative ease of use. Very little attention has been paid to the task of actuallypopulating or maintaining the knowledge base. It seems to be assumed that if the initial analysisand tool building is done well enough this will become a trivial problem. To us a two foldevaluation seems necessary, an evaluation of the ease of building the shell for the task and anevaluation of the ease and efficiency of populating the KB and its performance.

In theory this approach can be used to populate any knowledge base once a shell has been set up.To populate a knowledge base there must be some way of systematically getting the expert toprovide expertise covering the domain. In the studies here cases were used, and no order in theavailability of the cases was required. We believe that eventually all knowledge acquisition reducesto asking an expert, at least implicitly, to deal with cases. These can be real cases that the KBSfails to deal with or synthetic cases constructed by the knowledge engineer or automatically, askingthe expert \"what if . . . ?\". There is no reason why in the evaluation method here one cannot usesynthetic cases or order the cases in some way. Similarly there is no reason why algorithmscannot be set to answer questions from the simulated expert KBS such as: \"Which attribute is mostimportant in distinguishing between conclusion A and conclusion B?\conclusion made?\such a framework would seem possible for all KBS albeit with varying difficulty depending on theinteraction with the expert required.

Clearly the answers to these questions by a simulated expert will not be as good as answers fromreal experts, just as there were problems with the quality of the simulated experts here. Howeverthe issue is whether one can provide useful data for comparisons. In the study here the use of asimulated expert approach made it possible to build 180 different RDR systems (3 domains x 3 MLexperts x 2 levels of expertise x 9 randomisations) and provide data on the repetition in andperformance of RDR KBs. This approach allows one to compare different KA methods and tomodify the type and level of expertise provided. The differences in the performance of methodsshould be readily identifiable and the simulated expert algorithm readily modified to explore thesedifferences even if the expertise is not ideal. The expertise will be less than and different fromhuman expertise but the volume and variety of data that can be produced is quite impossible usingreal experts. We suggest that the simulated expert approach is not only suitable but is probably theonly way in which to evaluate how different methods and tools facilitate populating knowledgebases.

Acknowledgments

This work has been funded by the Australian Research Council. The authors are grateful to BrianGaines and Ross Quinlan for making available their software available and to Brian Gaines for hisassistance in preparing the Garvan data set.

Bibliography

Bachant, J. and McDermott, J. (1984). R1 revisited: four years in the trenches. The AIMagazine Fall: 21-32.

Catlett, J. (1992). Ripple-Down-Rules as a Mediating Representation in Interactive Induction.Proceedings of the Second Japanese Knowlege Acquisition for Knowledge-BasedSystems Workshop, Kobe, Japan, 155-170

Cendrowska, J. (1987). An algorithm for inducing modular rules. International Journal ofMan-Machine Studies 27(4): 349-370.

Compton, P., Edwards, G., Kang, B., Lazarus, L., Malor, R., Menzies, T., Preston, P.,Srinivasan, A. and Sammut, C. (1991). Ripple down rules: possibilities and limitations. 6thBannf AAAI Knowledge Acquisition for Knowledge Based Systems Workshop,Banff, 6.1-6.18

Compton, P., Horn, R., Quinlan, R. and Lazarus, L. (19). Maintaining an expert system, in J.R. Quinlan (Eds.), Applications of Expert Systems. London, Addison Wesley. 366-385.Compton, P., Kang, B., Preston, P. and Mulholland, M. (1993). Knowledge Acquisition withoutAnalysis, in N. Aussenac, G. Boy, B. Gaineset al (Eds.), Knowledge Acquisition forKnowledge Based Systems. Lecture Notes in AI (723). Berlin, Springer Verlag. 278-299.

Compton, P. and Preston, P. (1990). A minimal context based knowledge acquisition system.\"Knowledge Acquisition: Practical Tools and Techniques\" AAAI-90 Workshop,Boston,

Compton, P. J. and Jansen, R. (1990). A philosophical basis for knowledge acquisition.Knowledge Acquisition 2: 241-257. (Proceedings of the 3rd European KnowledgeAcquisition for Knowledge-Based Systems Workshop, Paris 19, pp 75-)

Edwards, G., Compton, P., Malor, R., Srinivasan, A. and Lazarus, L. (1993). PEIRS: apathologist maintained expert system for the interpretation of chemical pathology reports.Pathology 25: 27-34.

Fensel, D. and Poeck, K. (1994). A Comparison of Two Approaches to Model-Based KnowledgeAcqusition, in L. Steels, G. Schreiber and W. Van de Velde (Eds.), A Future for KnowledgeAcqusition: Proceedings of EKAW'94. Berlin, Springer-Verlag. 46-62.

Gaines, B. (19). An ounce of knowledge is worth a ton of data: quantitative studies of the trade-off between expertise and data based on statistically well-founded empirical induction.Proceedings of the Sixth International Workshop on Machine Learning., San Mateo,California, Morgan Kaufmann. 156-159

Gaines, B. (1991a). Induction and visualisation of rules with exceptions. 6th AAAIKnowledge Acquisition for Knowledge Based Systems Workshop, Bannf, 7.1-7.17Gaines, B. (1991b). The tradeoff between knowledge and data in data acquisition, in G.Piatetsky-Shapiro and W. Frawley (Eds.), Knowledge Discovery in Databases. Cambridge,Massachusetts, MIT Press. 491-505.

Gaines, B. and Compton, P. (1994). Induction of Meta-Knowledge about Knowledge Discovery.IEEE Transactions on Knowledge and Data Engineering 5(6): 990-992.

Gaines, B. and Musen, M., Ed. (1994). Proceedings of the 8th AAAI-Sponsored BanffKnowledge Acquisition for Knowledge-Based Systems Workshop. Banff, Canada,

Gaines, B. R. and Compton, P. J. (1992). Induction of Ripple Down Rules. AI '92.Proceedings of the 5th Australian Joint Conference on Artificial Intelligence,Hobart, Tasmania, World Scientific, Singapore. 349-3

Horn, K., Compton, P. J., Lazarus, L. and Quinlan, J. R. (1985). An expert system for theinterpretation of thyroid assays in a clinical laboratory. Aust Comput J 17(1): 7-11. (AustComput J)

Kang, B., Compton, P. and Preston, P. (1994). Multiple Classification Ripple Down Rules.Proceedings of the Third Japanese Knowledge Acquisition for Knowledge-BasedSystems Workshop (JKAW'94), Tokyo, Japanese Society for Artificial Intelligence. 197-212

Kang, B., Compton, P. and Preston, P. (1995). Multiple Classification Ripple Down Rules :Evaluation and Possibilities. Proceedings of the 9th AAAI-Sponsored BanffKnowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada,in press

Linster, M., Ed. (1992). Sisyphus'91: Models of Problem Solving. GMD (Gesellschaft furMathematik und Datenverarbeitung mbH).

Mansuri, Y., Compton, P. and Sammut, C. (1991). A comparison of a manual knowledgeacquisition method and an inductive learning method. Australian workshop on knowledgeacquisition for knowledge based systems, Pokolbin, 114-132

Menzies, T. and Compton, P. (1995). The (Extensive) Implications of Evaluation on theDevelopment of Knowledge Based Systems. Proceedings of the 9th AAAI-SponsoredBanff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, inpress

Mulholland, M., Preston, P., Hibbert, B. and Compton, P. (1993). An expert system for ionchromatography developed using machine learning and knowledge in context. Proceedings ofthe Sixth International Conference on Industrial and Engineering Applications ofArtificial Intelligence and Expert Systems, Edinburgh, 12 pagesQuinlan, J. (1992). C4.5: Programs for Machine Learning. Morgan Kaufmann.

Shaw, M. (1988). Validation in a knowledge acquisition system with multiple experts.Proceedings of the International Conference on Fifth Generation ComputerSystems, Tokyo, 1259-1266

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文