利用爬蟲在html中獲取的相應圖片資源src的代碼是這樣的但是再通過代碼將資源轉成鏈接的形式下載圖片的時候，就報了400的錯誤然而，我使用chrome去測試鏈接是否存在是，發現，真正對方網站伺服器能夠辨識的是也...

直接把網域+取得的img src屬性拼起來不行麼

網頁爬蟲 - Java爬蟲已取得圖片連結但是無法下載圖片

利用爬蟲在html中取得的對應圖片資源src的程式碼是這樣的

#但是再透過程式碼將資源轉換成連結的形式下載圖片的時候，就報了400的錯誤

#然而，我使用chrome去測試連結是否存在是，發現，真正對方網站伺服器能夠識別的是

#也就是說我透過網頁獲得圖片資源的連結是
http://www.neofactory.co.jp/i... 2.jpg
然而，正常能夠取得圖片的連結是
http://www.neofactory.co.jp/i...

請各位大神指導之後該怎麼辦，我在網路上查了很多資料，還是沒有解決辦法。
ps:奇怪的是我用Firefox的話，上面的連結也能得到圖片，我就百思不得其解了。

程式碼:

public class Image
{

private String urlNeo="";
public String getUrlNeo() {
    return urlNeo;
}
public void setUrlNeo(String urlNeo) {
    this.urlNeo = urlNeo;
}
public String getHtml() throws Exception{
    ArrayList<String> list=new ArrayList<String>();    
    String line="";
    String Html="";
    URL url=new URL(urlNeo);
    URLConnection connection=url.openConnection();
    InputStream in=connection.getInputStream();
    InputStreamReader isr=new InputStreamReader(in);
    BufferedReader br=new BufferedReader(isr);
    while((line=br.readLine())!=null){
        Html+=line;
        list.add(line);
    }
    br.close();
    isr.close();
    in.close();
    return Html;
}
public String getImgSrc() throws Exception{
    String html=getHtml();
    String IMGURL_REG_xpath="//p[1]/p[2]/p[2]/p/node()";
    String imginfomation="";
    JXDocument jxDocument = new JXDocument(html);
    imginfomation=(jxDocument.sel(IMGURL_REG_xpath).toString()).substring(1,jxDocument.sel(IMGURL_REG_xpath).toString().length() - 1);
    return imginfomation;
}
public List<String> getImgXpath() throws Exception{
    String str="";
    String IMGSRC_REG = "img.product.\w.*.jpg";
    List<String> list1=new ArrayList<String>();
    List<String> list2=new ArrayList<String>();
    String listimg = getImgSrc();
    Matcher matcher = Pattern.compile(IMGSRC_REG).matcher(listimg);
    while (matcher.find()) {
        list1.add(matcher.group());
    }
    for(int i=1;i<=(list1.size()/2);i++){
        int j=i*2;
        list2.add(list1.get(j-1));
    }
    return list2;
}
public void download(String admin_no) throws Exception{
    List<String> list=new ArrayList<String>();
    list=getImgXpath();
    for(String img:list){
        System.out.println(img);
        String url="http://www.neofactory.co.jp/"+img;
        URL uri=new URL(url);
        URLConnection con=uri.openConnection();
        con.setConnectTimeout(5000);
        InputStream in=con.getInputStream();
        
        byte[] buf=new byte[1024];
        int length=0;            
        File sf=new File("D:\item_neo_photo\"+admin_no);
        if(!sf.exists()){
            sf.mkdirs();
        }
        String[] a=img.split("/");
        OutputStream os=new FileOutputStream(sf.getPath()+"\"+a[a.length-1]);
        
        while((length=in.read(buf))!=-1){
            os.write(buf, 0, length);
        }
        
        os.close();
        in.close();
    }
}

}

黄舟2837 天前628

網頁爬蟲 - Java爬蟲已取得圖片連結但是無法下載圖片

全部回覆(2)我來回復