[SAC] [OSGeo] #2081: wp: Data corruption on profile save (encoding problem)

#2081: wp: Data corruption on profile save (encoding problem)
---------------------+----------------------
Reporter: marisn | Owner: webcom@…
     Type: defect | Status: new
Priority: normal | Milestone:
Component: WebSite | Keywords:
---------------------+----------------------
Seems that new WP site is not happy with my name and insists it should be
M?ris instead of Māris (a slight, but significant difference).

How to test:
* open profile page https://staging.www.osgeo.org/wp-admin/profile.php
* change your "First Name" to "Māris" (or "ēŗūīōāšģķļžčņ" for better
effect)
* hit "Update profile" button
* go back to "First Name" field ("???š???ž??" – really?)

Either non-latin letters should be instantly rejected before save or saved
as is (not lost).

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2081&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2081: wp: Data corruption on profile save (encoding problem)
---------------------+-----------------------
Reporter: marisn | Owner: webcom@…
     Type: defect | Status: new
Priority: normal | Milestone:
Component: WebSite | Resolution:
Keywords: |
---------------------+-----------------------

Comment (by wildintellect):

Can someone verify the database was created as UTF8?
https://codex.wordpress.org/Converting_Database_Character_Sets

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2081#comment:1&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2081: wp: Data corruption on profile save (encoding problem)
---------------------+-----------------------
Reporter: marisn | Owner: webcom@…
     Type: defect | Status: new
Priority: normal | Milestone:
Component: WebSite | Resolution:
Keywords: |
---------------------+-----------------------

Comment (by robe):

nope it was created with: latin1_swedish_ci collation

which I guess is the default for MySQL.

However staging2.www.osgeo.org, is created with utf8mb4_general_ci

unfortunately when I loaded the backup from staging.www.osgeo.org the
tables kept their original encoding so are mostly latin.

There are quite a few tables that have collation - utf8_general_ci

What I can do is change the tables in staging2 to be utf8 and then maybe
we can test there and confirm that fixes the issue.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2081#comment:2&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2081: wp: Data corruption on profile save (encoding problem)
---------------------+-----------------------
Reporter: marisn | Owner: webcom@…
     Type: defect | Status: new
Priority: normal | Milestone:
Component: WebSite | Resolution:
Keywords: |
---------------------+-----------------------

Comment (by robe):

Okay I tested changing the usermeta table on staging2 that holds this info
and then tried to update my profile.
It worked after I explicitly changed the table columns.

So doing ALTER TABLE .. table_name .. is not sufficient.

Annoying. Anyway I'm going to reload the table and write a script to
convert all columns.

Now only question is which utf to use. Seems utf8mb_unicode is the more
standard preferred so I'll try to go with that.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2081#comment:3&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2081: wp: Data corruption on profile save (encoding problem)
---------------------+-----------------------
Reporter: marisn | Owner: webcom@…
     Type: defect | Status: new
Priority: normal | Milestone:
Component: WebSite | Resolution:
Keywords: |
---------------------+-----------------------

Comment (by robe):

I reloaded the data from https://staging.www.osgeo.org to

  https://staging2.www.osgeo.org

and then converted all the tables and columns to utf8mb4 character set and
utfmb4_unicode_ci collation.

After that I tried the above two examples, and the input text is no longer
mangled.

I went with utf8mb4 because MYSQL utf8 (is a 3 byte character code system
so doesn't support all utf8). The downside is utf8mb4 doesn't work with
really old mysql drivers or mysql servers.

I did a quick test on staging.www.osgeo.org to quickly confirm it supports
utf8mb4.

Anyrate I'm hesitate to make the change on staging.www.osgeo.org, without
people looking at staging2 to confirm nothing got mangled in the process.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2081#comment:4&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2081: wp: Data corruption on profile save (encoding problem)
---------------------+-----------------------
Reporter: marisn | Owner: webcom@…
     Type: defect | Status: closed
Priority: normal | Milestone:
Component: WebSite | Resolution: fixed
Keywords: |
---------------------+-----------------------
Changes (by robe):

* status: new => closed
* resolution: => fixed

Comment:

I've moved the site over to web18a.osuosl.org and also updated all the
tables to the utf8mb4.

This should fix the aforementioned problem.

To test, I updated Māris' name

https://www.osgeo.org/member/nartiss/

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2081#comment:5&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.