Semantically-Aligned Equation Generation

(1)

N T U M I U L A B

Semantically-Aligned Equation Generation

for Solving and Reasoning Math Word Problems

Ting-Rui Chiang and Yun-Nung (Vivian) Chen

https://github.com/MiuLab/E2EMathSolver

(2)

N T U M I U L A B Each notebook takes $0.5 and each pen takes $1. Tom has

$10. How many notebooks can he buy after buying 5 pens?

Math Word Problem

x = 10 − 1 × 5 ÷ 0.5

Reasoning & Solving

(3)

N T U M I U L A B

Prior Work

Non-neural approaches

• Template-based

(Kushman et al., Upadhyay and Chang)

Rely on hand-crafted features!

Deep learning

• Seq2Seq

(Wang et al., Ling et al.)

Does not use the structure of math expression.

x = (? + ?) × ? - ?

x = (1+ 2) × 3 - 4

fill

Problem

x = (1+ 2) × 3 - 4

generate

Our model is end-to-end and structural!

(4)

N T U M I U L A B

Decoder

Encoder

Overview of the Proposed Model

stack action

Each notebook takes $0.5 and each pen takes $1.

Tom has $10. How many notebooks can he buy after buying 5 pens?

x = 10 − 1 × 5 ÷ 0.5

(5)

N T U M I U L A B Each notebook takes $0.5 and each pen takes $1. Tom has

$10. How many notebooks can he buy after buying 5 pens?

Look Again at the Problem

$1

$10

$0.5

?

(6)

N T U M I U L A B Each notebook takes $0.5 and each pen takes $1. Tom has

$10. How many notebooks can he buy after buying 5 pens?

Semantic Meaning of the Operands

x = ( 10 − 1 × 5 ) ÷ 0.5

The amount of money Tom has

Price of a pen

Price of a notebook

Number of pens bought

(7)

N T U M I U L A B

Idea: Bridging Symbolic and Semantic Worlds

Symbolic World Semantic World

(8)

N T U M I U L A B

Preprocess

0.5 1 10

5 Preprocess

Symbolic Part

Each notebook takes $0.5 and each pen takes $1. Tom has

$10. How many notebooks can he buy after buying 5 pens?

(9)

N T U M I U L A B

Symbol Encoding

0.5 1 10

5 Preprocess

Symbolic Part

Each notebook takes $0.5 and each pen takes $1. Tom has

$10. How many notebooks can he buy after buying 5 pens?

Encode

Semantic Part

(10)

N T U M I U L A B

Inside Encoder

Each notebook takes $ 0.5 and ...

(11)

N T U M I U L A B

Semantic Generation for Unknown x

Each notebook takes $ 0.5 and ...

(12)

N T U M I U L A B

Operands & Their Semantics

0.5 1 10

5 Symbolic Part x Semantic Part

Each notebook takes $0.5 and each pen takes $1. Tom has

$10. How many notebooks can he buy after buying 5 pens?

(13)

N T U M I U L A B

Intuition of Using Semantics

x = ( 10 − 1 ? 5 )

Price of a pen.

Number of pens bought.

Each notebook takes $0.5 and each pen takes $1. Tom has

$10. How many notebooks can he buy after buying 5 pens?

(14)

N T U M I U L A B

x 10 1 5 × − 0.5 ÷ =

Equation Generation in Postfix

Each notebook takes $0.5 and each pen takes $1. Tom has

$10. How many notebooks can he buy after buying 5 pens?

(15)

N T U M I U L A B

• Stack is used

• The decoder generates stack actions.

• An equation is generated with actions on stack.

Equation Generation by Stack Actions

x = 10 − 1 × 5 ÷ 0.5

Decoder

stack action

stack

action

(16)

N T U M I U L A B Encoder

Action Selection in Each Step

Decoder

classifier

stack action

{+, -, × _, ÷ _{, =, Push}}

(17)

N T U M I U L A B

Target Equation: x = 10 − 1 × 5 ÷ 0.5 Generated Actions:

Equation Generation by Stack Actions

0.5 1 10

5 x

Action: push

(18)

N T U M I U L A B

x 0.5

1 10

5 x

10 Action: push

1 x

Equation Generation by Stack Actions

5 x 10

x 10

x 10 1

x 10 1 5

(19)

N T U M I U L A B

x 0.5

1 10

5 x

10 1

x

Equation Generation by Stack Actions

5 x 10

x 10

x 10 1

x 10 1 5

Action: ×

1 × 5

(20)

N T U M I U L A B

x=(10−1 × 5) ÷ 0.5 0.5

1 10

5 x

x

Equation Generation by Stack Actions

x 10

x 10 1

x 10 1 5 × 0.5 ÷ =

After many steps…

(21)

N T U M I U L A B

• Target equation is given.

• Trained as Seq2Seq.

Training Process

Decoder Encoder

Each notebook takes $0.5 and each pen takes $1. Tom has $10. How many notebooks can he buy after buying 5 pens?

x 10 1 5 …

<bos>

(22)

N T U M I U L A B

• Dataset: Math23k

• In Chinese

• 23000 math word problems.

• Operators: +, -, ×, ÷

Experiments

(23)

N T U M I U L A B

Results

45 50 55 60 65 70

Retrieval BLSTM Self-Attention Seq2Seq w/SNI Proposed Hybrid

Acc.

Retrieval Template Generation Ensemble

≈ 8%

> 1%

(24)

N T U M I U L A B

59 60 61 62 63 64 65 66

Char-Based Word-Based Word-Based -Semantic

Word-Based -Gate

Word-Based -Gate -Attention

-Stack

Acc.

Ablation Test

≈ 3%

≈ 2.5%

(25)

N T U M I U L A B Encoder

Self-Attention for Qualitative Analysis

Each notebook takes $ 0.5 and ...

(26)

N T U M I U L A B Encoder

Self-Attention for Qualitative Analysis

Each notebook takes $ 0.5 and ...

(27)

N T U M I U L A B

The attention focuses on:

• Informative verbs

o “gain”, “get”, “fill”, etc.

• Quantifier-related words

o “every”, “how many”, etc.

Attention for Operand Semantics

(28)

N T U M I U L A B

Three main contributions

• Approach: equation generation with stack

• Originality: automatic extraction of operand semantics

• Performance: a SOTA end-to-end neural model on Math23k

Conclusion

(29)

N T U M I U L A B

Code Available @

https://github.com/MiuLab/E2EMathSolver