Most of applications we roll out from our software factories are initially targeted to specific customer-base that is located in a particular geographical region. But in today’s era of globalization, whole world is one market. Sometimes down the road,most of the enterprise grade applications will have to undergo internalization. So it’s extremely important to keep that in my mind while conceiving an application from architectural & design perspective.
Let’s have look at typical data elements exposed by an application to its end-users. This includes label, headers, footer, images, help documents, pop-up messages etc. At the same time, application will also send out messages in the form of mails, SMS and others. When we look at design of the system from an i18n perspective, there are challenges in terms of ability of the application to generate those elements as per user’s locale. Another challenge which we tend to trivialize is the need for managing those data. This can become potential headache for enterprise grade applications if not addressed properly. Another challenge comes in the form of translation. The development team who develops the applications will not be conversant with all the languages but will most likely to be limited to be English and his native language. So team has to depend upon the third party vendors for translation. This means the data needs to be exchanged with external vendor for translation and store the translated data back in the system. This also stresses the need for data management.
Data Management Approaches
Let’s have look at the data management approaches in terms of how it is stored. In a typical application, data is stored in file system and in database systems.
Data Stored in file system
The data stored in the file system can be of following types
- Images,Java scripts,HTML stored in file system of web server
- Help documents in terms of pdf/htmls stored in file system of web server
- Message/Notification templates stored in the file system of application server.
In this scenario, team should first identify those data elements which are subjected to internationalization. Once this identification is done, then one has to externalize the i18n literal from those data elements and store it in database. For example, in the case of image “Save”, literal “Save” can be externalized and can be stored in database. This data can be send to third party vendor for translation, and then the translated values in other locales can be stored. In this case, better approach is to generate those images at runtime rather than storing images for all possible combination of locales in the file system.
Its advisable to store i18n specific literals in database rather than in file system because database based approach provides benefits in terms of the ability for centralized administration and management of the data. This helps in great way in release management and translation management.
Data Stored in database
There are primarily two types of data which are stored in database. One is transactional data which is dynamic in nature and other is reference data which is static in nature. It’s not advisable to do the translation of transactional data at runtime. Currently tools are not mature enough to do translation at runtime. Sometimes this approach may generate wrong data which can lead to litigation issues. At the same time, there are certain parts of data which can be subjected to i18n. For example, date in en_US locale should be displayed as MM/DD/YYYY format and for users in India; it should be displayed as DD/MM/YYYY. So its better to externalize the display format like date format, number formats etc from data elements and store it in the database.
Translation Management
One of the main areas in i18n data management is translation management. The translation management from an application development perspective basically involves
- Process of extracting the data for translation
- Exchanging data with third party vendor for translation
- Storing it back in data store.
To enable above process, infrastructure needs to be build to manage the above cycle. One of the main areas to be considered is data exchange format with translator. Is it going to be CSV format/DB dump/ proprietary format of the translation vendor? It’s not advisable to go in for proprietary format because infrastructure needs to be changed when we switch from one vendor to another. This means additional cost. So it is better to adopt standards based one. XLIFF fits in here.
XLIFF is an XML-based format created to standardize localization. It is intended to provide any software provider a single interchange file format that can be understood by any localization provider. It helps in lossless conversion to and from different formats and also avoids the problems related to encoding and character sets. It basically acts as a meta data for communication between developer and translator and also helps in version management. Since its XML based, it can be validated against XSD/DTD.
XLIFF Snippet
Conclusion
The intend of this post was to generate the awareness on the need for i18n data management strategy rather than recommending one. This is extremely crucial for enterprise grade applications. There are instances where I have to write a white paper to convince the need for the same. As discussed in my previous blog, its better to take up the i18n initiative in earlier part of the application life cycle rather than taking it as separate activity. Taking it as separate activity makes it difficult for two reasons.
- If design of the system is not flexible, then the whole exercise will involve making changes to the design and implementation of almost the whole system.
- Due to above reason, cost of the effort may go up and there may be considerable delay in roll out of the internationalized application.
It should start from the requirement phase by capturing i18n specific requirements. During architecture & design phase, Architect/Designer should define i18n strategies thats going to address those requirement and also its data management part. QA team should plan data needed for successful execution of their test cases.
*There are many places where I have used i18n, L10n interchangeably. There are few subtle differences. To find out more, please have look at thishttp://en.wikipedia.org/wiki/Internationalization_and_localization
To find out more details on XLIFF
http://developers.sun.com/dev/gadc/technicalpublications/articles/xliff.html