我利用c++的socket模拟了http访问某网站,结果返回乱码。
这个乱码很奇怪,它不是完全乱码,具体情况就是:
一部分中文乱码,一部分英文乱码,大部分还是正确的。
我用wireshark查看的时候,发现http返回是200。同时,我查看了返回的html文本,也是正确的。但就是在用socket接收的时候,会出现部分乱码。
对于本程序,我的思路是:
在myhttp.h中,封装一个mysocket的类,实现socket的connect,send和recv。recv是采用非阻塞的方式。为了保证所有的来自服务端的数据已经在缓存区,所以在send后sleep(1)后才调用recv。另外,我还封装了一个myhttp类,继承于mysocket。myhttp主要是接收各种http参数,然后构造出一个http请求,然后发送。
test.cc主要是对myhttp的调用。
下面是我的代码:
myhttp.h
#include <cstring>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <initializer_list>
#include <unistd.h>
class mysocket{
private:
int sockfd;
struct sockaddr_in servaddr;
string response;
protected:
mysocket(string ip,string port){
const char * c_ip = ip.c_str();
const char * c_port = port.c_str();
this->sockfd = socket(AF_INET,SOCK_STREAM,0);
memset(&(this->servaddr),0,sizeof(this->servaddr));
(this->servaddr).sin_family = AF_INET;
(this->servaddr).sin_port = htons(atoi(c_port));
inet_pton(AF_INET,c_ip,&(this->servaddr).sin_addr);
}
void mysocket_connect(){
connect(this->sockfd,(struct sockaddr*)&(this->servaddr),sizeof(this->servaddr));
}
void mysocket_send(string request){
const char * c_request = request.c_str();
send(this->sockfd,c_request,strlen(c_request),0);
sleep(1);
}
void mysocket_recv(){
char buf[1024];
memset(&buf,0,sizeof(buf));
while(int recvlength = recv(this->sockfd,buf,sizeof(buf),MSG_DONTWAIT)){
if(recvlength < 0)
break;
else{
this->response += buf;
memset(&buf,0,sizeof(buf));
}
}
cout << this->response;
}
};
class myhttp:public mysocket{
private:
string ip;
string port;
//得到数据length
string getLength(string data){
int length = data.size();
char lenstr[12] = {0};
sprintf(lenstr,"%d",length);
return string(lenstr);
}
public:
myhttp(string ip,string port):mysocket(ip,port){ //调用父类构造函数
this->ip = ip;
this->port = port;
}
//path,addition
void GET(string path,initializer_list<string> args){
//处理可变长度参数
string addition = "";
for(auto arg : args)
addition = addition + arg + "\r\n";
//形成http请求
string http_request = "GET " + path + " HTTP/1.1\r\nHost: " + this->ip + ":" + this->port + "\r\n" + addition + "\r\n";
this->mysocket_connect();
this->mysocket_send(http_request);
this->mysocket_recv();
//while(1){}
}
//path,data,contentType,addition
void POST(string path,string data,string contentType,initializer_list<string> args){
//处理可变长度参数
string addition = "";
for(auto arg : args)
addition = addition + arg + "\r\n";
//得到data长度
string length = this->getLength(data);
//形成http请求
string http_request = "POST " + path + " HTTP/1.1\r\nHost: " + this->ip + ":" + this->port + "\r\nContent-Type: " + contentType + "\r\ncontent-length: " + length + "\r\n" + addition + "\r\n" + data;
this->mysocket_connect();
this->mysocket_send(http_request);
this->mysocket_recv();
//while(1){}
}
};
test.cc
#include <iostream>
#include <fstream>
using namespace std;
#include "myhttp.h"
int main(){
myhttp * Myhttp = new myhttp("123.57.236.235","80");
Myhttp->GET("/Public/js/jquery.min.js",{});
return 0;
}
乱码部分(比如对某jquery文件的访问的返回的文本):
伊谢尔伦2017-04-17 13:27:08
Looks like I’ll write the answer myself! Finally solved! Hope it helps others later.
The following is my analysis and solution process:
First of all, I found a problem:
The distance between the two garbled positions is 1024 characters. In my program, the size of buf in recv is char buf[1024]. Are the two 1024s too coincidental?
Then, I tried calling strlen(buf) after each recv. Theoretically it should output 1024, right? (At least not larger than 1024). Then, the result is that the value of strlen(buf) is 1030, which exceeds 6. Then I was thinking, does the garbled code come from those 6 excess array elements?
So, I started to change the parameter values in recv:
recv(this->sockfd,buf,(sizeof(buf) - 6),MSG_DONTWAIT)
It became like this, and the result was no garbled characters. .
The above is the solution.
Finally, I still haven’t figured out the reason. Could you please give me some advice?
怪我咯2017-04-17 13:27:08
It is best not to use cpp to do this. It is a headache to process strings.
ringa_lee2017-04-17 13:27:08
Where did the garbled code you posted come from? Was it printed out by the terminal?
It is recommended not to look at the printing situation first, but to save the output to a file, then take out the file and open it with another text editor to see if it is garbled.
I think it is possible that the garbled characters you see are just the encoding problem of your terminal and have nothing to do with your code.