Context Appropriateness and Concept Fitness

Chapter 5. Word Sense Disambiguation

5.3 Context Appropriateness and Concept Fitness

In this section, we will explore context appropriateness and concept fitness more formally, and prepare notations that will be used later.

Suppose we have a set of concepts¹⁴ . If we consider the relation between concepts and contexts, we get context appropriateness by fixing concept. In other words, we want to know the appropriateness of a specific context for a given concept. For example, if concept bank financial institution is given, context t1=“he saved his money in the biggest bank” is appropriate but context t₂= “he takes a walk on the river bank” is inappropriate. Please note that although in WSD case, the set of concepts may be for senses of a word, but this formulation did not be restricted to WSD case only.

Now, we define context appropriateness to be a real-valued function that can correctly rank the appropriateness of a context given a concept s_i in Equation 2.

14 In this chapter, we consider concept and word sense to be the same.

If we category contexts into two levels, we can say that

, and ,

Equation 2

Equation 2 just maintains an order between the appropriateness of contexts. In simplest case, we use and to denote the set of contexts that appropriate and not appropriate for concept s_i, respectively. In the mentioned example, context t₁ belongs to and t₂ belongs to for concept si = bank financial institution. This kind of formulation is motivated by Kintsch’s (2001) and Mitchell and Lapata (2008) in measuring sentence similarity after meaning composition. Their idea is that a good meaning composition model should result in a new vector that closer to vectors with similar meaning. When applying to concept and its context, we want machine can learn a good context appropriateness function that gives higher scores to appropriate contexts than inappropriate contexts for a concept. In Equation 2, meaning composition is important but not necessary. If there is enough information to judge the score, meaning composition is not necessary. But meaning composition gives us a new way to process features in WSD problems. We can use the composed feature vector for machine learning algorithms instead of the raw features for concept and context. This viewpoint is new in WSD, and we will illustrate approaches later to utilize this feature processing approach to enlarge the size of training data.

In knowledge extraction literatures, as we mentioned earlier, context appropriateness function is used to judge the reliability of extracted knowledge. Knowledge extraction researchers want to extract knowledge from free text in tuple format (relation, argument 1, argument 2) such as extracting (IS-A, a car, a vehicle) from sentence “a car is a vehicle usually driven by an engine of sorts”. In this case, concept is IS-A relation, and meaning composition may take place between two arguments “a car” and “a vehicle”. We do not

restrict in how meaning composition is applied in Equation 2, and we will demonstrate how meaning composition work in WSD.

If meaning composition is used, we use to denote it, where is a vector and, and is a meaning composition function for context appropriateness. We will use to denote functions without and with meaning composition hereafter for simplicity.

If we consider the relation between concepts and contexts, we get concept fitness by fixing context. We want to disambiguate the precise concept (or word sense) in a specific context. For example, concept si = bank financial institution.is fitter than concept sk = bank sloping land

of water in a given context t1=“he saved his money in the biggest bank”. This is exactly a WSD

problem, but we use a more general viewpoint to re-consider it. We will find that we can reformulate WSD in many ways if we adopt this more general viewpoint. In addition, this viewpoint can unify different WSD settings in a single viewpoint such as standard WSD and graded word sense problems (Erk & McCarthy, 2009).

Like context appropriateness, we define concept fitness to be a real-valued function that can correctly rank the fitness of concepts given a context tj in Equation 3.

If we category contexts into two levels, we can say that

, and ,

Equation 3

Equation 3 also just maintains an order between the fitness of concepts. In simplest case, we use and to denote the set of concepts that fit and not fit for context , respectively.

In standard WSD setting, is a context to be disambiguated, usually contains one word sense or multiple word senses, and contains other word senses that not fit in context . In

graded word sense problem (Erk & McCarthy, 2009), function returns different scores for different word senses. Therefore, concept fitness is more general than standard WSD problems. Like context appropriateness, meaning composition can be adopted in a useful way.

If meaning composition is used, we use to denote it, where is a vector, and is a meaning composition function for concept fitness.

We will use to denote functions without and with meaning composition hereafter for simplicity.

Now, we jointly consider context appropriateness and concept fitness. We can derive equations Equation 4 to Equation 6 when different constraints are adopted. If we use two levels (fit and not-fit) for context appropriateness and concept fitness, we get Equation 4 and Equation 5, where , , , and .

Equation 4

Equation 5

Equation 4 and Equation 5 can be derived from definitions of Equation 3 and Equation 2, respectively. They suggest that context appropriateness and concept fitness provide different viewpoints to judge the validness of combining a concept and a context. If we careful choose different representation schemes for these two viewpoints, we can generate more training data for supervised algorithms. For example, if there are 5 senses for a word and there is a context with only one correct concept , the concept that not fit for context are in , and then we can generate 4 training instances by using Equation 4. These training instances are showed in the followings.

在文檔中概念表徵及其應用 (頁 61-65)