Naive text cleaning
May 11th, 2011
Text is an unstructured data in which we can extract some entities. Assume that the text is a random sequence of words, punctuation and noise. You can find a really naive text cleaner under the cat.
Read more...
Password recovery in Postgresql
May 9th, 2011
I forgot the password for the user postgre. I googled about how to recover the password.
Read more...
Java stuff. Part 1
May 6rd, 2011
There are some java stuff under the cut.
Read more...
Boolean search
May 3rd, 2011
The Boolean search is a model for information retrieval in which Document is considered as a bag of words. Any query is posed as a boolean expression of terms where term means word.
The bag of words
Let us consider closely the model by an example.
Read more...
Full text search in Postgresql
April 14th, 2011
The majority of a web-applications needs for a search functionality. The easiest way to develop the search functionality in database driven web-applications on their own is using of a regexps. Usually developers remember about that there are a some disadvantages.
  • The pattern matching query processes all documents every time and there is no index support.
  • There are no liguistic support, eg you are searching for a document that contains entry but documents that contains entries will be missed.
  • There is no relevation, ranking, e.t.c.
So why developers use regexps?
Read more...
Earlier
Moi krug - Yernat Assanov
Advertisement
Documentolog
(C) 2010, kseeker
Email: kseeker@yandex.kz
Используются технологии uCoz