I am studying if we could migrate the translation tooling from Transifex to Weblate. I have started this because with the current setup Transifex is changing a lot of translations when I upload updates of the translation source, making it difficult to do the synchronization between GitHub and Transifex.
Weblate is a copyleft libre software and OSGeo is hosting its own instance, already used by several OSGeo projects (postgis, pgrouting and grass gis at least).
Thanks to Regina Obe, I have set up a GeoServer project on the OSGeo instance to study how weblate works and if there is something which can prevent us from using it.
I have already two points to share with you to get some feedback:
First, when you configure a component into weblate, you cannot have two items for the same language, even if they are in a different encoding. As a consequence, I cannot directly integrate most of the core components since they contain 2 files for the Chinese language: is it something which can be changed? Which one is used by GeoServer?
Second, when you change the translation of a text in weblate, it automatically replaces special characters by their equivalent in unicode, even if the character exists in the ISO-8859-1 encoding. For example:
Weblate is a good idea; we held back perviously because of lack of hosting. However if OSGeo is able to host
I do not understand about two items for the same language: can you provide links in the github repo? Do you mean two properties files; or two entries in the same property file. Or two entries in different property files?
I am guessing you mean two property files for the chinese language; where we followed some wicket convention for having property files in different encodings. I think we should just use the utf8 encoding? Which is not a java standard but wicket supports it.
You are guessing well: it is all about the Chinese language : for weblate it is the same language defined twice so it cannot cope with it.
I have to check how weblate can work with having one of the languages in UTF-8 whereas the other ones are in ISO-8859-1 : I fear that it will mean to have two components side by side.
For your last point, it seems to work well for ages: Romanian is mixing both characters encoding whereas Japanese and Korean are totally with unicode characters inside a ISO-8859-1 encoded file.
I try to keep only the UTF-8 file as the source for Chinese, along with the other languages in ISO-8859-1: the display of the file content in weblate is totally broken. I have to define another fake component configured to support UTF-8 encoding to be able to manage it correctly.
You may have to search back in the geoserver-devel history or bug tracker for details. I assume we were provided utf8 translations and wished to preserve them?
When using the UI in weblate can folks see the native characters regardless of encoding used?
What I remember is that we needed UTF8 to support some chinese characters; but the java properties file format is ISO-8859-1 (like the actual format on disk).
Wicket had some naming convention where you could append utf8 to the filename - but no other tools understand this.
I try to keep only the UTF-8 file as the source for Chinese, along with the other languages in ISO-8859-1: the display of the file content in weblate is totally broken. I have to define another fake component configured to support UTF-8 encoding to be able to manage it correctly.
You are guessing well: it is all about the Chinese language : for weblate it is the same language defined twice so it cannot cope with it.
I have to check how weblate can work with having one of the languages in UTF-8 whereas the other ones are in ISO-8859-1 : I fear that it will mean to have two components side by side.
For your last point, it seems to work well for ages: Romanian is mixing both characters encoding whereas Japanese and Korean are totally with unicode characters inside a ISO-8859-1 encoded file.
Weblate is a good idea; we held back perviously because of lack of hosting. However if OSGeo is able to host
I do not understand about two items for the same language: can you provide links in the github repo? Do you mean two properties files; or two entries in the same property file. Or two entries in different property files?
I am guessing you mean two property files for the chinese language; where we followed some wicket convention for having property files in different encodings. I think we should just use the utf8 encoding? Which is not a java standard but wicket supports it.
I am studying if we could migrate the translation tooling from Transifex to Weblate. I have started this because with the current setup Transifex is changing a lot of translations when I upload updates of the translation source, making it difficult to do the synchronization between GitHub and Transifex.
Weblate is a copyleft libre software and OSGeo is hosting its own instance, already used by several OSGeo projects (postgis, pgrouting and grass gis at least).
Thanks to Regina Obe, I have set up a GeoServer project on the OSGeo instance to study how weblate works and if there is something which can prevent us from using it.
I have already two points to share with you to get some feedback:
First, when you configure a component into weblate, you cannot have two items for the same language, even if they are in a different encoding. As a consequence, I cannot directly integrate most of the core components since they contain 2 files for the Chinese language: is it something which can be changed? Which one is used by GeoServer?
Second, when you change the translation of a text in weblate, it automatically replaces special characters by their equivalent in unicode, even if the character exists in the ISO-8859-1 encoding. For example:
I had a look at the git history for the UTF8 file: the file was created by you by renaming the CN ISO file (https://osgeo-org.atlassian.net/browse/GEOS-10282). If I understand the JIRA issue correctly, this was related to the encoding issue in the translation files (the same old story) and the idea was perhaps to migrate all the translations to UTF-8 encoding.
Weblate UI manages to display the characters in all options:
ISO-8859 characters in ISO-8859 encoded file
U encoded characters (\u00e9 for example) in ISO-8859 encoded file
UTF-8 file
Weblate can read and write either ISO-8859 files or UTF-8 files but inside for a given component you can have only one of the two options. For ISO-8859 files, weblate can read either ISO-8859 characters (é) and U encoded characters (\u00e9) but when you change a translation, it will always be written in U encoded characters.
For the UTF-8 file for Chinese, the Transifex update does not apply to it but only to the one in ISO-8859 encoding : could I remove them from the source code?
I just gave another try to Transifex and I still have unwanted changes (translations removed without asking it): it will give a lot of work to check things at each update so I am clearly in favor of moving to the osgeo weblate instance (another advantage I discovered recently is that with the GitHub integration 1) it is no longer required to use a local dev environment to synchronize source and translation and 2) administrators are informed of sources to get updated).
I will do a last check on the different configuration options to write down which options we would like to take. I will freeze the translations on Transifex meanwhile.
I had a look at the git history for the UTF8 file: the file was created by you by renaming the CN ISO file (https://osgeo-org.atlassian.net/browse/GEOS-10282). If I understand the JIRA issue correctly, this was related to the encoding issue in the translation files (the same old story) and the idea was perhaps to migrate all the translations to UTF-8 encoding.
Weblate UI manages to display the characters in all options:
ISO-8859 characters in ISO-8859 encoded file
U encoded characters (\u00e9 for example) in ISO-8859 encoded file
UTF-8 file
Weblate can read and write either ISO-8859 files or UTF-8 files but inside for a given component you can have only one of the two options. For ISO-8859 files, weblate can read either ISO-8859 characters (é) and U encoded characters (\u00e9) but when you change a translation, it will always be written in U encoded characters.
You may have to search back in the geoserver-devel history or bug tracker for details. I assume we were provided utf8 translations and wished to preserve them?
When using the UI in weblate can folks see the native characters regardless of encoding used?
What I remember is that we needed UTF8 to support some chinese characters; but the java properties file format is ISO-8859-1 (like the actual format on disk).
Wicket had some naming convention where you could append utf8 to the filename - but no other tools understand this.
I try to keep only the UTF-8 file as the source for Chinese, along with the other languages in ISO-8859-1: the display of the file content in weblate is totally broken. I have to define another fake component configured to support UTF-8 encoding to be able to manage it correctly.
You are guessing well: it is all about the Chinese language : for weblate it is the same language defined twice so it cannot cope with it.
I have to check how weblate can work with having one of the languages in UTF-8 whereas the other ones are in ISO-8859-1 : I fear that it will mean to have two components side by side.
For your last point, it seems to work well for ages: Romanian is mixing both characters encoding whereas Japanese and Korean are totally with unicode characters inside a ISO-8859-1 encoded file.
Weblate is a good idea; we held back perviously because of lack of hosting. However if OSGeo is able to host
I do not understand about two items for the same language: can you provide links in the github repo? Do you mean two properties files; or two entries in the same property file. Or two entries in different property files?
I am guessing you mean two property files for the chinese language; where we followed some wicket convention for having property files in different encodings. I think we should just use the utf8 encoding? Which is not a java standard but wicket supports it.
I am studying if we could migrate the translation tooling from Transifex to Weblate. I have started this because with the current setup Transifex is changing a lot of translations when I upload updates of the translation source, making it difficult to do the synchronization between GitHub and Transifex.
Weblate is a copyleft libre software and OSGeo is hosting its own instance, already used by several OSGeo projects (postgis, pgrouting and grass gis at least).
Thanks to Regina Obe, I have set up a GeoServer project on the OSGeo instance to study how weblate works and if there is something which can prevent us from using it.
I have already two points to share with you to get some feedback:
First, when you configure a component into weblate, you cannot have two items for the same language, even if they are in a different encoding. As a consequence, I cannot directly integrate most of the core components since they contain 2 files for the Chinese language: is it something which can be changed? Which one is used by GeoServer?
Second, when you change the translation of a text in weblate, it automatically replaces special characters by their equivalent in unicode, even if the character exists in the ISO-8859-1 encoding. For example: