Posted by: Paraic | July 26, 2008

Languages of India

I had the opportunity at the SIGIR conference to attend a tutorial on Languages of India (for Information Retrieval).  This map is a great illustration of not just the language diversity but also the diversity of scripts used in the various languages.

Languages of India

There are 22 languages recognized by the Constitution of India, though there are 29 languages spoken by more than a million native speakers.

On the internet, according to the Internet World Stats quoted in the tutorial, with about 60 million users online of the 1.1 billion population, that’s only just over 5% penetration, though the internet usage growth is put at 1100% (2000-2007).

The CLIA project funded by the Department of Information Technology in India brings together a consortium of 11 institutes to promote online information access in Indian languages.  Their Forum for Information Retrieval Evaluation (FIRE) will be putting in place an infrastructure and a set of resources to support the development and evaluation of information access in Hindi, Bangla, Marathi, Tamil, Telugu, Punjabi and Malayalam (and English).

I very much enjoyed the tutorial (lots of information about morphology and sytnax of the various languages that I won’t go in to here) and I look forward to seeing the success of the FIRE initiative.


Leave a comment

Categories