Archive September 2010

Create custom notification on your Ubuntu desktop using python

You can easily create custom notifications for your applications on ubuntu. Ubuntu’s current notification system is called NotifyOSD. It provides api for several languages. Using python-notify library it is really so easy than in any other.

Ubuntu by default have the python-notify installed. If don’t install it using

sudo apt-get install python-notify
No Comments

Read more

Convert unicode codepoints to unicode hex values in java

In a part of our crawler development, we encountered a Bangla news site (http://www.kalerkantho.com/) which uses code points instead of unicode hex valus in their website. Although it renders banla fonts in browser, but when viewings source it only shows code points, so when downloaded by crawler we only got কકى similar.  For the indexing purpose we needed to convert them to hex values so that it renders bangla font anywhere. The process of converting is really so simple.

1 Comment

Read more

Extract hyperlinks from html using regular expression in java

2 years ago, I worked with an crawler which can fetch webpages from internet, then parse the links from the page and then visit all the pages linked to the page. At that time I didn't have any idea about regular expressions. So I had to write around a 500 hundred line code to parse links and meta tags from html.

Yesterday, I had to do the same job again. This time I took up regular expression to parse the html <a tags followed by href attribute to extract the links.

Regular expressions can be difficult to understand if written at once, so I am going to write it in easy way first, then i ll make it complex to support variations in page links.

1 Comment

Read more