Home  >  Q&A  >  body text

Weird character encoding for storing data, old script shows them fine, new script doesn't

I'm trying to rewrite an old website.

It is in Persian and uses Persian/Arabic characters.

CREATE DATABASE `db` DEFAULT CHARACTER SET utf8 COLLATE utf8_persian_ci;
USE `db`;

Almost all my tables/columns have COLLATE set to utf8_persian_ci

I'm using codeigniter for my new script, and I've

'char_set' => 'utf8',
'dbcollat' => 'utf8_persian_ci',

is in the database settings, so no problem.

So this is the weird part

The old scripts used some kind of database engine called TUBADBENGINE or TUBA DB ENGINE... nothing special.

When I entered some data (in Farsi) into the database using an old script, when I looked at the database, the characters were stored as Ø1مران .

The old script gets/displays the data fine, but the new script displays them using the same weird font/charset as the database

So when I type rather the data stored in the database looks like Ø1مرا٠and when I get it in a new script I get See Ø1مرا٠but in the old script I see ?????

CREATE TABLE IF NOT EXISTS `tnewsgroups` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `fName` varchar(200) COLLATE utf8_persian_ci DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci AUTO_INCREMENT=11 ;

--
-- Dumping data for table `tnewsgroups`
--

INSERT INTO `tnewsgroups` (`ID`, `fName`) VALUES
(1, 'عمران'),
(2, 'معماری'),
(3, 'برق'),
(4, 'مکانیک'),
(5, 'test'),
(6, 'test2');

On the other hand, when I enter ??? directly into the database

Of course, I stored the same in the database rather

The new script displays very well

but in the old script I get ? ? ?

Can anyone understand this?

This is a large engine

https://github.com/maxxxir/mz-codeigniter-crud/blob/master/tuba.php

Usage example of old script:

define("database_type" , "MYSQL");
define("database_ip" , "localhost");
define("database_un" , "root");
define("database_pw" , "");
define("database_name" , "nezam2");
define("database_connectionstring" , "");
$db = new TUBADBENGINE(database_type , database_ip , database_un , database_pw , database_name , database_connectionstring);
$db->Select("SELECT * FROM tnews limit 3");
if ($db->Lasterror() != "") { echo "<B><Font color=red>ÎØÇ ! áØÝÇ ãÌÏøÏÇ ÊáÇÔ ˜äíÏ";  exit(); }
for ($i = 0 ; $i < $db->Count() ; $i++) {
    $row = $db->Next();
    var_dump($row);
}

P粉141455512P粉141455512283 days ago334

reply all(2)I'll reply

  • P粉295616170

    P粉2956161702024-01-11 16:03:06

    deceze's answer is very good, but I can add some information that might help with handling a large number of records without having to test them manually.

    If the conversion CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8) fails, NULL will be printed instead of the field_name content.

    So I use this to find those records:

    SELECT IFNULL(
        CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8)
        , '**************************************************')
    FROM table_name

    Or this:

    SELECT id, field_name, CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8)
    FROM table_name
    WHERE CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8) IS NULL

    UPDATE with this clause only affects records that have been converted successfully:

    UPDATE table_name
    SET
    field_name = CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8mb4 )
    WHERE
    CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8mb4) IS NOT NULL

    reply
    0
  • P粉920835423

    P粉9208354232024-01-11 13:05:48

    In short, because this question has been discussed a thousand times before:

    1. PHP saves a string, such as "汉字", encoded in UTF-8. The byte is E6 BC A2 E5 AD 97.
    2. It sends this string over the database connection set to latin1.
    3. The database received the bytes E6 BC A2 E5 AD 97 and thought they represented latin1 characters.
    4. Database storage characters æ¡ ¡ ¿ Li>
    5. The same process in reverse causes PHP to receive the same bytes and then treat them as UTF-8. The round trip works just fine for PHP, although the database doesn't handle characters the way it should.

    So the problem here is that the database connection is not set up correctly when the data is entered into the database. You have to convert the data in the database to the correct characters. Try this:

    SELECT CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8) FROM table_name

    Maybe utf8 is not what you need, please try it. If it works, change it to a UPDATE statement to permanently update the data.

    reply
    0
  • Cancelreply