Designing the First Persian Stemmer by the Researchers of the Regional Information Center for Science and Technology

Researchers of the Regional Information Center for Science and Technology designed the first Persian stemmer which is used in different ways for categorizing, processing and retrieving information.

First Persian stemmer is designed in 2009 in the Regional Information Center for Science and Technology just forty years after the first stemmer of the world called “Lovins Stemmer” designed in 1968. The second stemmer was composed by Martin Porter in 1980. These two stemmers which are the most important ones ever used are monolingual and are applied to English language.

Moreover, studies have been done on other languages such as Spanish and Arabic and multi-lingual stemmers have been composed as well.

Until now most of the stemmers have used algorithms similar to that of Porter for stemming. That is why they all have the same advantages and disadvantages. Actually, they have general minute differences like differences in lists and number of rules. But, RICeST’s Persian stemmer has started its operation making use of linguistic knowledge and standard algorithms supported by 10 plural suffixes and nearly 2000 exceptions (irregular plural nouns).  

RICeST’s Persian stemmer has the ability of deriving singular nouns from the plural ones. This system can distinguish between singular and plural nouns, identify plural suffixes, omit plural suffixes to produce a singular noun, and find the exceptions.

 

This system has many applications such as decreasing different forms of a noun. Moreover, it is used in automatic categorization of texts in big sized computer files. Of course, this software can be used for doing research on the word stems in retrieving information and save 30 to 35 percent of the data storage space.