Home  >  Article  >  Backend Development  >  [XML] Solution to garbled characters in UTF8 and GB2312 encoding conversion

[XML] Solution to garbled characters in UTF8 and GB2312 encoding conversion

Y2J
Y2JOriginal
2017-04-22 13:53:092426browse

The audited information must be generated as an XML file, and the XML must be encoded in GB2312, because many of the collected news websites use UTF8 encoding, so garbled characters appear during the conversion process

I recently worked on a small project, and when I encountered such problems, I recorded them as a summary.
This project is divided into two parts, one is news data collection, the other is the review of the collected information, and finally the XML file is generated.
After the data collected has been edited by the user, an ACCESS file must be exported and then imported into the information review system. The field type that stores news information in the ACCESS library is the ntext type, while the corresponding field in the audit system library is the varchar (max) type. After importing, it was found that some blank characters will appear garbled, appearing as question marks (?). In fact, After subsequent testing, it turns out that this is not a blank (space) character, but a special character. What should I do? After several tests, it was found that the varchar(max) type should be changed to nvarchar(max) type, so that the imported data will no longer have such problems.
However, during the subsequent testing process, it was found that after the imported collected information was changed (through the .net program editing function), the information in the database was garbled again. After research, it was found that the insertion statement was written like this This kind of problem will not occur, such as insert into table name (news) values ​​(N'"+updated value+""), why add N? Go to Baidu and you will understand.
At this point, in my mind I finally got relief, but the following problems made me depressed...
The reviewed information must be generated in XML format, because there are many news websites collected. The website uses UTF8 encoding, so garbled characters appear during the conversion process (it's still caused by the "blank" special character). What should I do? It is said on the Internet that converting UTF8 into GB2312 is enough, but in practice, it still cannot be solved. Problem, I have been working on it all morning to solve this problem, but in the end there is no way. When I was depressed, I suddenly thought of using the debugging function of VS to see what this special character is, and finally read the value of this field in the database. After taking it out, and then converting it into a character array, content.ToCharArray(); looked at it one by one and found that the character that caused the garbled code was ' '. Pay attention to the space in the quotation marks. This is not a space, but a space that cannot be recognized in GB2312. special characters, I suddenly thought, can I replace the value of this character directly with a space? I acted immediately, and sure enough, the garbled problem was solved. I wasted half a day on this stupid thing.
Note. , you must use the value obtained from debugging (because this is the real special character that causes garbled characters). When debugging, paste the

code as follows:

content = content.Replace(" ", " ");
.

The above is the detailed content of [XML] Solution to garbled characters in UTF8 and GB2312 encoding conversion. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn