Close Sidebar
Welcome to Nigerian Developers Community, where you can ask questions and receive answers from other members of the community.

Stemmers vs Lemmatizers

0 votes

Natural Language Processing (NLP), especially for English, has evolved into the stage where stemming would become an archaic technology if "perfect" lemmatizers exist. It'* because stemmers change the surface form of a word/token into some meaningless stems.

Then again the definition of the "perfect" lemmatizer is questionable because different NLP task would have required different level of lemmatization. E.g. Convert words between verb/noun/adjective forms.


[in]: having
[out]: hav


[in]: having
[out]: have
  • So the question is, are English stemmers any useful at all today? Since we have a plethora of lemmatization tools for English

  • If not, then how should we move on to build robust lemmatizers that can take on nounify,verbifyadjectify and adverbify preprocesses?

  • How could the lemmatization task be easily scaled to other languages that have similar morphological structures as English?

asked Jul 6, 2013 by ugochimbo (860 points)

1 Answer

0 votes
  • Stemmers are much simpler and faster then lemmatizers, and for many applications their results are good enough. Using a lemmatizer for that is a waste of resources. Consider, for example, dimensionality reduction in Information Retrieval. You replace all "drive/driving" by "driv" in both the searched documents and the query. You do not care if it is "drive" or "driv" or "x17a$" as long as it clusters inflectionally related words together.

Regarding the other two questions:

  • What is your definition of a lemma, does it include derivation or only inflection?

    If you want to include derivation (which most people would say includes verbing nouns etc.) then keep in mind that derivation is far more irregular than inflection. There are many idiosincracies, gaps, etc. Do you really want for to change and change (as coins) to have the same lemma? If not, where do you draw the boundary? It really depends on the application.

  • What do you mean by "similar morphological structures as English". English has very little inflectional morphology. There are good lemmatizers for other types morphologies (truly inflectional, agglutinative, template, ...).

    With a possible exception of agglutinative languages, I would argue that a lookup table (say a compressed trie) is the best solution. (Possibly with some backup rules for unknown words such as proper names). Obviously for most languages you do not want to create the table by hand, but instead generate it from a description of morphology of that language. For inflectional languages, you can go the engineering way of Hajic for Czech or Mikheev for Russian, or, if you are daring, you use two-level morphology. Or you can do something in between, such as Hana (myself) (Note that these are all full morphological analyzers that include lemmatization). Or you can learn the lemmatizer in an unsupervised manner a la Yarowsky and Wicentowski, possibly with manual post-processing, correcting the most frequent words. It really depends what you want to do with the results.

answered Jul 6, 2013 by lordchimbo (150 points)
  1. Login or Register

    Click to open login box. Login or register from here

  2. Search Box

    Click to open the search box. You can search within the site content here

  3. Hello Admin!

    You can reach to admin section by clicking on this navigation item

  4. Ask A Question

    Start asking a question by clicking on this navigation item

  5. Find All Recent Activities

    Here you can find all recent activities recent question, answer, comment etc

  6. Vote Question or Answer

    You can give up or down vote to the question or answer by clicking on respective arrow button

  7. Give Your Answer

    Click this button to give your answer to the question

  8. Add Comment

    To post comment click on the button

  9. Select Best Answer

    Consider the answer as the best answer for your question by clicking one to the tick

  10. Congratulations!!!

    Now you learn how to use site. Why don't you start to ask a question or provide some answer to the community!