Abstract:
Automatic extraction of formal knowledge specifications from Natural Language (NL) text is a challenging research and development area. Currently the task is considered feasible for restricted NL input only. A number of CG researchers approached the problem, applying Sowas algorithm for analysis of NL input by joins of relevant canonical graphs. This paper summarises the state of the art and describes the CGExtract prototype, which approaches the subject by integration of Parasite, an already existing component for NL analysis and Understanding (NLU). This powerful NLU machine produces logical form for each correct sentence and processes coreferences in extended discourse of several sentences. Given an initial type hierarchy and relevant lexicon information, CGExtract constructs new KB graphs corresponding to the input text, by (i) checking whether the input sentences represent one connected graph, (ii) proving the KB consistency, i.e. proving whether the new graph is in contradiction with the already existing KB graphs and (iii) proving whether the new graph yields loop definitions with al- ready existing KB graphs. The CGWorld workbench [4] supports the user interface of CGExtract.