Naive text cleaning
May 11th, 2011
Text is an unstructured data in which we can extract some entities.
Assume that the text is a random sequence of words, punctuation and noise.
You can find a really naive text cleaner under the cat.
Password recovery in Postgresql
May 9th, 2011
I forgot the password for the user postgre.
I googled about how to recover the password.
Boolean search
May 3rd, 2011
The Boolean search is a model for information retrieval in which Document is considered as a bag of words.
Any query is posed as a boolean expression of terms where term means word.
Let us consider closely the model by an example.
Full text search in Postgresql
April 14th, 2011
The majority of a web-applications needs for a search functionality.
The easiest way to develop the search functionality in database driven web-applications on their own is using of a regexps.
Usually developers remember about that there are a some disadvantages.
- The pattern matching query processes all documents every time and there is no index support.
- There are no liguistic support, eg you are searching for a document that contains entry but documents that contains entries will be missed.
- There is no relevation, ranking, e.t.c.