Feature #2738
Add "Part of Speech Tag" to Word
Status: | Resolved | Start date: | 09/24/2017 | |
Priority: | Normal | Due date: | ||
Assignee: | D. Hamann | % Done: | 100% | |
Category: | Type Proposal | |||
Target version: | Robotics Service Bus - rsb-0.17 |
branched from 0.15 and like to have it added to 0.15
Related issues
Associated revisions
Added field part-of-speech-tag to Word in proto/stable/rst/dialog/SpeechHypothesis.proto
fixes #2738
Signed-off-by: Jan Moringen <jmoringe@techfak.uni-bielefeld.de>
Backport: Added field part_of_speech_tag to Word in proto/stable/rst/dialog/SpeechHypothesis.proto
refs #2738
Signed-off-by: Jan Moringen <jmoringe@techfak.uni-bielefeld.de>
(cherry picked from commit ebd6d16853ff239e27fca161f7e46fd3a2bc6530)
Added field part_of_speech_tag to Word in proto/stable/rst/dialog/SpeechHypothesis.proto
fixes #2738
Signed-off-by: Jan Moringen <jmoringe@techfak.uni-bielefeld.de>
Backport: Added field part_of_speech_tag to Word in proto/stable/rst/dialog/SpeechHypothesis.proto
refs #2738
Signed-off-by: Jan Moringen <jmoringe@techfak.uni-bielefeld.de>
(cherry picked from commit 5e87d699aef234e4bdbbf4bd9b7b36a2e0e502ee)
#1 Updated by J. Moringen about 7 years ago
- Status changed from New to Feedback
- Assignee set to D. Hamann
There are two issues with this:
- Regardign the tagging scheme itself:
- What is the canonical name and documentation for this tagging scheme?
- The documentation under http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/ suggests that the tagging scheme is called Stuttgart-Tübingen-Tagset (STTS). Is this correct?
- If so, what is the reason for choosing this particular scheme?
- To which languages is this scheme applicable?
- The individual enum items were not documented.
Based on the above issues, I made the following version of the enum and field:
/** * Part-of-speech tags using a modified version of the * Stuttgart-Tübingen-Tagset (STTS). * * Differences w.r.t. STTS are: * * * ``KOMM`` instead of ``$,`` * * ``END`` instead of ``$.`` * * ``IPNCT`` instead of ``$(`` * * @see http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/stts.asc * "Description of the STTS (in German)" */ enum PosTag { /** * Attributives Adjektiv. */ ADJA = 1; /** * Adverbiales oder prädikatives Adjektiv. */ ADJD = 2; /** * Adverb. */ ADV = 3; /** * Präposition; Zirkumposition links. */ APPR = 4; /** * Präposition mit Artikel. */ APPRART = 5; /** * Postposition. */ APPO = 6; /** * Zirkumposition rechts. */ APZR = 7; /** * Bestimmter oder unbestimmter Artikel. */ ART = 8; /** * Kardinalzahl. */ CARD = 9; /** * Fremdsprachliches Material. */ FM = 10; /** * Interjektion. */ ITJ = 11; /** * Ordinalzahl. */ ORD = 12; /** * Unterordnende Konjunktion. */ KOUI = 13; /** * Unterordnende Konjunktion. */ KOUS = 14; /** * Nebenordnende Konjunktion. */ KON = 15; /** * Vergleichskonjunktion. */ KOKOM = 16; /** * Normales Nomen. */ NN = 17; /** * Eigennamen. */ NE = 18; /** * Substituierendes Demonstrativpronomen. */ PDS = 19; /** * Attribuierendes Demonstrativpronomen. */ PDAT = 20; /** * Substituierendes Indefinitpronomen. */ PIS = 21; /** * Attribuierendes Indefinitpronomen ohne Determiner. */ PIAT = 22; /** * Attribuierendes Indefinitpronomen mit Determiner. */ PIDAT = 23; /** * Irreflexives Personalpronomen. */ PPER = 24; /** * Substituierendes Possessivpronomen. */ PPOSS = 25; /** * Attribuierendes Possessivpronomen. */ PPOSAT = 26; /** * Substituierendes Relativpronomen. */ PRELS = 27; /** * Attribuierendes Relativpronomen. */ PRELAT = 28; /** * Reflexives Personalpronomen. */ PRF = 29; /** * Substituierendes Interrogativpronomen. */ PWS = 30; /** * Attribuierendes Interrogativpronomen. */ PWAT = 31; /** * Adverbiales Interrogativ- oder Relativpronomen. */ PWAV = 32; /** * Pronominaladverb. */ PAV = 33; /** * "zu" vor Infinitiv. */ PTKZU = 34; /** * Negationspartikel. */ PTKNEG = 35; /** * Abgetrennter Verbzusatz. */ PTKVZ = 36; /** * Antwortpartikel. */ PTKANT = 37; /** * Partikel bei Adjektiv oder Adverb. */ PTKA = 38; /** * SGML Markup. */ SGML = 39; /** * Buchstabierfolge. */ SPELL = 40; /** * Kompositions-Erstglied. */ TRUNC = 41; /** * Finites Verb, voll. */ VVFIN = 42; /** * Imperativ, voll. */ VVIMP = 43; /** * Infinitiv, voll. */ VVINF = 44; /** * Infinitiv mit "zu", voll. */ VVIZU = 45; /** * Partizip Perfekt, voll. */ VVPP = 46; /** * Finites Verb, aux. */ VAFIN = 47; /** * Imperativ, aux. */ VAIMP = 48; /** * Infinitiv, aux. */ VAINF = 49; /** * Partizip Perfekt, aux. */ VAPP = 50; /** * Finites Verb, modal. */ VMFIN = 51; /** * Infinitiv, modal. */ VMINF = 52; /** * Partizip Perfekt, modal. */ VMPP = 53; /** * Nichtwort, Sonderzeichen enthaltend. */ XY = 54; /** * Komma */ KOMM = 55; /** * Satzbeendende Interpunktion. */ END = 56; /** * Sonstige Satzzeichen; satzintern. */ IPNCT = 57; } /** * Part-of-speech tag for this word. */ optional PosTag part_of_speech_tag = 3;
Apart from the questions raised in 1., would this OK?
#2 Updated by D. Hamann about 7 years ago
J. Moringen wrote:
There are two issues with this:
- Regardign the tagging scheme itself:
- What is the canonical name and documentation for this tagging scheme?
- The documentation under http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/ suggests that the tagging scheme is called Stuttgart-Tübingen-Tagset (STTS). Is this correct?
- If so, what is the reason for choosing this particular scheme?
- To which languages is this scheme applicable?
Yes, this tagging scheme is called Stuttgart-Tübingen-Tagset (STTS). It is only applicable for german language.
This is the default tag-set and it has the best performance from the the german data-model for the toolkit from which we recieve the tags.
Info about the tagger-models: https://github.com/stanfordnlp/CoreNLP/blob/master/doc/tagger/README-Models.txt
- The individual enum items were not documented.
Based on the above issues, I made the following version of the enum and field:
Apart from the questions raised in 1., would this OK?
Yes this is OK.
#3 Updated by D. Hamann about 7 years ago
- Status changed from Feedback to Resolved
- % Done changed from 0 to 100
Applied in changeset rst-proto|ebd6d16853ff239e27fca161f7e46fd3a2bc6530.
#4 Updated by J. Moringen about 7 years ago
- Target version set to rsb-0.17
- % Done changed from 100 to 0
#5 Updated by D. Hamann about 7 years ago
D. Hamann wrote:
Applied in changeset rst-proto|ebd6d16853ff239e27fca161f7e46fd3a2bc6530.
I did not do that. How does this happen from my account!?
#6 Updated by J. Moringen about 7 years ago
I did not do that. How does this happen from my account!?
Sorry, this confuses everybody. I revised your patch, keeping the Author-Metadata of the commit intact, then pushed it. I also annotated in the commit message that the commit fixes this issue. Redmine interprets that as you closing the issue because you are the Author of the commit.
#7 Updated by J. Moringen about 7 years ago
- % Done changed from 0 to 100
#8 Updated by J. Moringen about 7 years ago
- Related to Feature #2737: Add "Corenlp Dependency Tree" to SpeechHypothesis added