A Tool to Analyze
Verb Phrase and Noun Phrase Relationship in Sentences
Te-En Huang 1,Tao-Hsing Chang 2, ADAT Technology Co. 3, Jon-Fan Hu 4 Department of Psychology, National Cheng Kung University, Tainan City, TaiwanABSTRACT
SPACY is a well-known package for NLP analysis for delineating the Verb phrases and Direct Objects in English by applying the default structures to define noun phrase. However, SPACY lacks a function to include the status of adjectives and vast amount of noun phrase structures for identifying the relationship between Verbs and Nouns efficiently. The present study develops a SPACY-based program to customize practical noun phrase structures written in industrial SOPs for machine operations. It performs better at merging overlapping structures, for example, a sentence “An important thing of NLP is hard to define” can be processed to be “An important thing, NLP, thing of NLP”; and then automatically merged into one noun phrase “An important thing of NLP”. The capacity of the program can abstract the core concepts of sentences and recognize the co-occurrences of noun phrases and their associated verbs from the corpus for the research and application purposes.
1. INTRODUCTION
In industry, a huge amount of machines need experienced people in repairing and installing machines parts. However, experienced people are often in short supply for that purpose. We need to spend much time for training maintenance technicians and will become a burden as cost. Hence, we wish to have a simple way for training maintenance technician. ADAT Technology Co. has tried to use AR (Augmented Reality) to guide operator for fixing machines step by step. When starting to fix the machine, AR can detect where the machine is in condition immediately. At the same time, the operator can also achieve the same results as skilled worker as well.
However, certain problems have arisen. If we set all of the operation manual in AR, the operator will be hard to identify the main process. Thus, we start to analysis sentences from operation manual. We found that the most important parts in sentence are Verbs and Noun Phrases. For example, “Take this new movable screwdriver to turn the screw on the right-hand side”. In this sentence, “new movable screwdriver” and “take” will be important words. In Spacy (Honnibal, M. (2016)), it is hard to detect all the complete noun phrases. However, our program can replace the function for it, we can use our program to find the relationship between Verb and complete noun phrases automatically. Finally, we can change this sentence into several simple imperative sentences. Like “Take new movable screwdriver.”, “Turn screw”. In this time, all the emphasis of the sentence will be presented, and will be easier to understand in AR. In the other side, we use our program to produce verbs and noun phrases relation form. We can easily understand the relationship according to this form. In the future, we can base on this form to replace the difficult verb into the simple one. For example, “Suspend” we can automatically change into “Hang”. Thus, the operator would not be confused about the unfamiliar words, and will be clearly understand what the next correct step should be.
2. METHOD
Figure 1. Program Flow Chart Program process
2.1 Convert PDF File into TXT File automatically
In industry, most of the existing documentation used PDF file format, but this format can’t be read by program. Therefore, we use PDFBox (Ben Litchfield & Daniel Wilson & Philipp Koch (2003)) to change PDF file into TXT file automatically.
2.2 Organize files and delete unneeded information 2.2.1 PDF excess word delete
In the current document, we found that there is something unneeded. Therefore, we calculated every repetition and meaningless part. And automatically revise the extra words.
2.2.2 Extract PDF subject
In the factory document, we have some subject information like “title”,” required tools”, “device”, etc. Thus, we extracted them out of the text file. And record it into the program.
Convert PDF File into TXT File automatically
Organize files and delete unneeded information PDF excess word delete Extract PDF subject Extract PDF inner text
Build Information structure
Mark up all the word’s structure in Spacy
According to the predefined "industrial" sentence structureto combine words into noun phrases
Re-detecting the relationship between noun phrases and Verbs
Automated list all the nouns phrase and verbs relationship
2.2.3 Extract inner text
The most important part in the document was the inner text. For example, first step, second step, warning, caution, etc. Therefore, we need to extract them to follow up verbs and noun phrases matching.
2.3 Build information structure
In order to showing what the detail is in the steps, we need a structured way (Fig. 2.) to separate all the information from the document, and we will clearly know what the step is.
Figure. 2. Data structure
2.4 Mark up all the word’s structure in Spacy
After structure all the information, we can start to analyze verbs and noun phrases. We put sentence into Spacy to make the preliminary analysis. We will get of-speech (POS) tags, DependcyParser (Fig. 3.). In the past, we got some part-of-speech (POS) tags error or DependcyParser error (Fig. 3. switch). Therefore, we will base on this result to make the next analysis.
2.5 According to the predefined "industrial" sentence structure to combine words into noun phrases
In the previous section, we learned about Spacy's limitations. Therefore, we organized and analyzed the structure on documents and categorized several common error structures. Finally, we can merge and modify the words by our definition. (Fig.4.)
Fig.4. Merge Noun Phrase
2.6 Re-detecting the relationship between noun phrases and verbs
When we reedit the sentence with our program, we will put it into Spacy for analysis. At this point we can see that the result is much better than the original one. (Fig. 5.)
Figure. 5. Correct Structure from our program
the library store lock
2.7 Automated list all the nouns phrase and verbs relationship
Finally, we have completed the verb phrase and noun phrase detection for all the articles that need to be analyzed and counted a relation form. Later, we can accurately change each sentence into a simple and concise imperative sentence.
3. RESULTS
SPACY is a well-known package for NLP analysis. But, when applied in industry, it will get the different sentence structure. Therefore, we develop a program, which can use self-defined structure (Table. 1.).
Table. 1. self-defined structure Table
{(<POS>NOUN<OP>+)(<POS>NUM<OP>+)}
{(<POS>DET<OP>?)(<POS>ADJ<OP>*)(<POS>NOUN<OP>+)} {(<POS>DET<OP>?)(<POS>ADJ<OP>*)(<POS>PROPN<OP>+)} {(<POS>DET<OP>+)(<POS>VERB<OP>*)(<POS>NOUN<OP>+)} {(<POS>DET<OP>+)(<POS>VERB<OP>*)(<POS>PROPN<OP>+)}
And we add a form (Table. 2.) that can define proper noun phrases by yourself. The noun phrases of this form won’t be judged wrong by Spacy.
Table. 2. definition Table
the TWINSCAN Run Test window the Proper Procedure
THE RH SENSORS Figure Reference
THE ADT This VSO
View Sensors This Procedure Correct Procedure
the RH A INSIDE
ADT window the SERV.630.86784
the update No crh144a.rep
the values B OUTSIDE
the sensors platform position
View Sensors window all reticles
The RH the system
IRIS IRM the IRL
Reticle Stage the TWINSCAN Navigation Manager
THE PROCEDURE the Test drop
the Run Test window the Run Test menu
Finally, it can indicate correct sentence structures by our program (Fig. 6.) and produce a relationship table between verbs and noun phrases (Table. 3.). In the other side it can easily be used in any area of expertise by people who cannot program by themselves.
Figure. 6. All Processes
Table. 3. Verbs and Noun Phrases Relationship Table
Verb NP
install remove push click
PANEL PR19 1 1 0 0 THE VSO 1 1 0 0 menu 0 0 0 1 Define noun phrase structure Extract information automatically Analysis sentences
Re-examine the structure of the sentence and fix
structure
List noun phrases
List Relationship Table Define Proper
noun phrases
Input PDF Files
List verb phrases
4. DISCUSSON
In industry, installed and repaired is very fixed and repetitive work. Therefore,
ADAT Technology Co. used AR technology to let novices can handle those things quickly and accurately. Base on this technique, only in the urgent time needs senior repairman. This way can not only highly reduce personnel costs, but also can detect which part have problems with coils falling off, screws not tightening , etc. Finally, we can install, maintain, or repair machines efficiently and correctly.
In our programs, we automatically extract all the important information and analyze the relationship of gerunds. We can know which part is the most important, and list all the relationship between NP and VP. In the past research, we knew that if the sentence was too long, it would be harder to understand. Therefore, in the future we can rewritten the sentence into several imperative sentence base on our program. On the other side, we can add some function in this program which can replace difficult verb into simple one. Not only the operator's understanding of the process will be improved, but also enable them to complete the repair machine accurately and quickly.
Our programs can be used for multiple purposes, because there is language everywhere. For example, the keywords of the article and the integration of information should be considered. Therefore, we can base on this program to extend many applications. Moreover, writing subject detection, difficulty prediction, and even making a more humane chat robot, etc. Thence, this program is very important for not only industry but also can allows people who don't write programs to analyze the correlation between statements by themselves.
REFERENCES
Ben Litchfield & Daniel Wilson & Philipp Koch (2003). Pdfbox Web site (https://sourceforge.net/projects/pdfbox/)
Gaifman, Haim (1965) Dependency systems and phrase-structure systems.
Information and Control 8:304–307.
Hays, David G. (1966) Parsing. In David G. Hays, ed., Readings in Automatic
Language Processing. New York: American Elsevier.
Honnibal, M. (2016). SpaCy (version 1.3.0). Spacy Web site (https://spacy.io/)
Hudson, Richard A. (1991) English Word Grammar. Oxford: Blackwell, 1991.
Marie-Catherine de Marneffe & Joakim Nivre. (2019) Dependency Grammar.
Annual Review of Linguistics 5:1, 197-218.
Chen, P. P. (1983). English Sentence Structure and Entity-Relationship