Casebase Reasoning (CBR)
Reasoning by cases. If I eat at a lot of restaurants and someone asks me for a good restuarant to go to I'll probably ask that person questions in order to find one or a few of the restaurants I know about. What kind of cuisine do you like? What is your price range? Is wine important? And so forth.
A casebase is simply a large set of cases. Each case describes one particular item using a set of features, or, in our case, questions. In the above example I might have a restuarant with values for the questions: cuisine is mexican, price range is low, wine is no. (I'm being a bit simplistic here.) The result of this case could be Taco Bell.
Some cases care about some questions and some care about other questions. That is, the questions for different cases do not have to be the same. They may or may not overlap. Overall, the entire set of questions of all the cases in the casebase are the features or questions of the casebase as a whole.
The interesting thing about cases and their features is that features do not have to match exactly. In our restaurant example I could use numbers for the price range. The taco bell number might be 6 (for $6). If we asked the user how much they want to spend and they answered 5, we would probably consider the Taco Bell case to be real close for that answer.
Some cases may have a feature that if it matches the user's answer can be considered an overwhelming condition for the case to match. We call that a confirm feature. Similarly there may be a reject feature - if the feature doesn't match the user's answer then that case will not even be considered any longer.
Now suppose you had cases that asked mutually exclusive questions. For example, are you between 15 and 18 years old. And another that asks if you are between 21 and 25 years old. And so forth. If one of those questions is asked in the affirmative then it makes little sense to ask any of the other age questions. You may create constraints that specify these exclusivity characterstics - a very useful thing to do.
Jnana's Casebase reasoning system allows you to create an application out of cases. You don't have to worry about how to run it. Jnana uses internal algorithms for asking questions in as intelligent a manner it can. (This applies to other forms of logic that Jnana supports as well.)
Advanced
The Match Threshold: A casebase has a threshold that is used to determine if a case can be considered to match. By default this match threshold is 100%. This means that a case must match exactly in order to be produced as a result.
Unanswered questions/features: When determining the match score of a case if there are features of the case that don't have answers by the user yet, those features can be ignored or discounted. If ignored they have no contribution to the match score for the case. If they are discounted then an unanswered value is used as part of the match score.
Confirm Ratio: If the system is looking at a set of cases for consideration it will examine those cases and determine a ratio of how many of them use a confirm question. If that ratio is above the casebase's confirm ratio then that question is likely to be asked of the user next. The purpose of this is to avoid focusing in on features that only discriminate between one case and the rest of the cases before it really makes sense to do so.
Min and Max number of Cases: The casebase will attempt to produce at least the minimum number of cases specified and no more than the maximum. By default the minimum is set to 1. As soon as a case matches we have succeeded. If the casebase has a tendency to produce a lot of matches at once then the maximum number may be used to reduce the number that have matched (lowest scored cases are eliminated until the number of matching cases is lowered to the maximum).
Feature Parameters: Each feature on the casebase has a number of parameters used in scoring the feature and the case. There is a match value that is used if the feature matches exactly. A mismatch value is used if the feature completely mismatches. If the feature is unanswered the unanswered value is used (if discount unanswereds is set on the casebase). If a feature partially matches a value somewhere between the match and mismatch value will be used. These numbers are between -10 and 10.
We already mentioned the confirm/reject attributes of a feature.
For textual features the feature can match using a soundex algorithm or using an exact match. A soundex match means that the feature could match without being an exact match.
For numeric features there is both a standard algorithm and a guassian distribution function. A left and right distance parameter is needed in this case as well as a precision.
These feature parameters can be specified at both the casebase and case level. Specifying them at the case level overrides any specfication at the casebase level.
(This should be considered an introduction to our CBR)
Thursday, July 10, 2008
Subscribe to:
Comments (Atom)