[GeoNetwork-users] Problem with accentuated char on post and get methods

Dear all,

I have a problem with accentuated chars on the search and advanced search
form.

The 'é è' are transformed as example to >> 'éè'

I don't have this problem on metadata edition and this chars are well send
and interpreted.
( I have before when I user enctype="multipart/form-data" for data upload
but I don't use it now )

I have a look to the difference between the part of metadata edition and
advanced-search and the only difference I see is the use of

accept-charset="UTF-8"<< on the form part on metadata-edit but I try it

on advanced search and that change nothing :frowning:

for some of my customisation I also use some get methods, and I have the
same problem of characters encoding.

Any ideas is welcome !

Regards,
Fabien Bachraty
--
View this message in context: http://www.nabble.com/Problem-with-accentuated-char-on-post-and-get-methods-tp18892552p18892552.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

Maybe it is an idea to add some code processing requests and responses
making sure it is UTF-8, e.g. in a servlet filter ? See
http://www.java2s.com/Code/Java/Servlets/FilteringpagetoUTF8.htm for
example..

Also, we should take a look at the Lucene analyzer we're using -- it would
be nice if a search for "moçambique" would match results that contain
"mocambique", and vice-versa.

Regards,
Heikki Doeleman

On Fri, Aug 8, 2008 at 7:12 AM, FBachraty
<Fabien.Bachraty@anonymised.com>wrote:

Dear all,

I have a problem with accentuated chars on the search and advanced search
form.

The 'é è' are transformed as example to >> 'Ã(c)è'

I don't have this problem on metadata edition and this chars are well send
and interpreted.
( I have before when I user enctype="multipart/form-data" for data upload
but I don't use it now )

I have a look to the difference between the part of metadata edition and
advanced-search and the only difference I see is the use of
>>accept-charset="UTF-8"<< on the form part on metadata-edit but I try it
on advanced search and that change nothing :frowning:

for some of my customisation I also use some get methods, and I have the
same problem of characters encoding.

Any ideas is welcome !

Regards,
Fabien Bachraty
--
View this message in context:
http://www.nabble.com/Problem-with-accentuated-char-on-post-and-get-methods-tp18892552p18892552.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

Hi Heikki,

On ven, 2008-08-08 at 07:36 -0700, heikki wrote:

Also, we should take a look at the Lucene analyzer we're using -- it would
be nice if a search for "moçambique" would match results that contain
"mocambique", and vice-versa.

Fuzzy search support that.

Francois

True, but then you must do fuzzy searches -- and which innocent users know
exactly what is a good (not too much, nor too less) fuzzy factor ?

If we use a custom analyzer that adds ISOLatin1AccentFilter to the
StandardAnalyzer, we'd have that effect in all queries, fuzzy or not.

Or do you think that'd be undesirable ?

regards,
Heikki

On Fri, Aug 8, 2008 at 10:16 AM, Francois-Xavier Prunayre <
francois-xavier.prunayre@anonymised.com> wrote:

Hi Heikki,

On ven, 2008-08-08 at 07:36 -0700, heikki wrote:
> Also, we should take a look at the Lucene analyzer we're using -- it
would
> be nice if a search for "moçambique" would match results that contain
> "mocambique", and vice-versa.
Fuzzy search support that.

Francois

On ven, 2008-08-08 at 11:47 -0700, heikki wrote:

True, but then you must do fuzzy searches -- and which innocent users
know exactly what is a good (not too much, nor too less) fuzzy
factor ?

so fuzzy factor is more useful for typos then and the Filter could take
care of the accent. "moçambique" and "mocambique" will be handle by an
ISOLatin1AccentFilter ?

If we use a custom analyzer that adds ISOLatin1AccentFilter to the
StandardAnalyzer, we'd have that effect in all queries, fuzzy or not.

I was also looking for multilingual search support. And it looks like
the best way to support that in Lucene is to have one index for each
language so that you could use different analyzer for each languages and
also define stopwords list according to the language.

Anyone having recommandations on that ?

Ciao. Francois

Or do you think that'd be undesirable ?

regards,
Heikki

On Fri, Aug 8, 2008 at 10:16 AM, Francois-Xavier Prunayre
<francois-xavier.prunayre@anonymised.com> wrote:
        Hi Heikki,
        
        On ven, 2008-08-08 at 07:36 -0700, heikki wrote:
        > Also, we should take a look at the Lucene analyzer we're
        using -- it would
        > be nice if a search for "moçambique" would match results
        that contain
        > "mocambique", and vice-versa.
        
        Fuzzy search support that.
        
        Francois

Take a look at the Lucene mailing lists, the topic of multilingual search is
discussed often. Using different indices per language is a viable approach;
but to use different analyzers, it is not absolutely necessary. In any case
it seems to me you must know what language the user is searching in, how
would you handle that? Use Locale for the default, and an option to select a
different language ?

As to the ISOLatin1Filter, I'm for using it !

On Fri, Aug 8, 2008 at 11:48 PM, Francois-Xavier Prunayre <
francois-xavier.prunayre@anonymised.com> wrote:

On ven, 2008-08-08 at 11:47 -0700, heikki wrote:
> True, but then you must do fuzzy searches -- and which innocent users
> know exactly what is a good (not too much, nor too less) fuzzy
> factor ?
so fuzzy factor is more useful for typos then and the Filter could take
care of the accent. "moçambique" and "mocambique" will be handle by an
ISOLatin1AccentFilter ?

> If we use a custom analyzer that adds ISOLatin1AccentFilter to the
> StandardAnalyzer, we'd have that effect in all queries, fuzzy or not.
I was also looking for multilingual search support. And it looks like
the best way to support that in Lucene is to have one index for each
language so that you could use different analyzer for each languages and
also define stopwords list according to the language.

Anyone having recommandations on that ?

Ciao. Francois

> Or do you think that'd be undesirable ?
>
> regards,
> Heikki
>
>
> On Fri, Aug 8, 2008 at 10:16 AM, Francois-Xavier Prunayre
> <francois-xavier.prunayre@anonymised.com> wrote:
> Hi Heikki,
>
> On ven, 2008-08-08 at 07:36 -0700, heikki wrote:
> > Also, we should take a look at the Lucene analyzer we're
> using -- it would
> > be nice if a search for "moçambique" would match results
> that contain
> > "mocambique", and vice-versa.
>
> Fuzzy search support that.
>
> Francois

Thank you, Heikki and François-Xavier

As said François-Xavier the difference between edition and advanced search
is the use of Javascript.

I have a look on gn_search.js

function gn_search(pars)

{

  var myAjax = new Ajax.Request(

    getGNServiceURL('main.search.embedded'),

    {

      method: 'get',

      parameters: pars,

      onSuccess: gn_search_complete,

      onFailure: gn_search_error

    }

  );

}

But I'm not a Javascript and Ajax specialist, and I don't find a way to
specify or force the UTF-8 encoding.

Anyway thank you !

Best regards,
Fabien Bachraty.
--
View this message in context: http://www.nabble.com/Problem-with-accentuated-char-on-post-and-get-methods-tp18892552p18920829.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

Hi Fabien,

On dim, 2008-08-10 at 23:35 -0700, FBachraty wrote:

But I'm not a Javascript and Ajax specialist, and I don't find a way to
specify or force the UTF-8 encoding.

could you check search using the xml.search service and see if it works
http://localhost:8080/geonetwork/srv/en/xml.search?any=vulnérable to see
if the problem is linked to JS or the search service.

JS search and *.search services with accents work for me using trunk.

Francois

I have also the problem with get methode

Xml.search.java

public Element exec(Element params, ServiceContext context) throws Exception
{...

context.error("Test : " + Xml.getString(params) );
...

http://127.0.0.1:8180/geonetwork/srv/en/xml.search?any=vulnérable

catalina

ERROR [jeeves.webapp.xml.search] - Test : <request>
  <any>vulnérable</any>
</request>

So javascript is innocent, isn't it ?
--
View this message in context: http://www.nabble.com/Problem-with-accentuated-char-on-post-and-get-methods-tp18892552p18924676.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

FBachraty wrote:

I have a problem with accentuated chars on the search and advanced search
form.

The 'é è' are transformed as example to >> 'éè'

Make sur your tomcat "server.xml" config file has URIEncoding="UTF-8" enable
in your connector tag. With this, I can succesfully search french characters
without any problems.

For Asian characters like japanese "マイクロ要素構成学", there will be no results
since geonetwork indexes each japanese character as a hole word. Searching
"マイクロ要素構成学" will become searching "マ イ ク ロ 要 素 構 成 学", where I added a white
space between each character. And it works well.

So add URIEncoding="UTF-8" in the <connector ...> tag in \Tomcat
5.5\conf\server.xml

It should look like this :
    <Connector .... .... URIEncoding="UTF-8" ... .../>

crayco

--
View this message in context: http://www.nabble.com/Problem-with-accentuated-char-on-post-and-get-methods-tp18892552p19048887.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

Hello crayco,

Thank you for your contribution.

That is resolving the both problems (get and post methodes are working)

Thank you very much !

Best regards,
Fabien Bachraty
--
View this message in context: http://www.nabble.com/Problem-with-accentuated-char-on-post-and-get-methods-tp18892552p19063401.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

Hi crayco,

I confirm, this setting works fine for us too.
That should be written down on a FAQ or somewhere else (I've already seen several posts looking for such a solution).

Regards
Sylvain

FBachraty a écrit :

Hello crayco,

Thank you for your contribution.

That is resolving the both problems (get and post methodes are working)

Thank you very much !

Best regards,
Fabien Bachraty
  

Sylvain Grellet wrote:

That should be written down on a FAQ or somewhere else (I've already
seen several posts looking for such a solution).

Very welcome. I wrote about this a year ago, but it's not always easy to
find your way in the all the posted messages. Any way, I just submitted a
new FAQ about it.
cheers!
crayco
--
View this message in context: http://www.nabble.com/Problem-with-accentuated-char-on-post-and-get-methods-tp18892552p19067524.html
Sent from the geonetwork-users mailing list archive at Nabble.com.