正文:
在用Java的HttpURLConnection 來(lái)下載網(wǎng)頁(yè),發(fā)現訪(fǎng)問(wèn)google的網(wǎng)站時(shí),會(huì )被google拒絕掉。
try
{
url = new URL(urlStr);
httpConn = (HttpURLConnection) url.openConnection();
HttpURLConnection.setFollowRedirects(true);
// logger.info(httpConn.getResponseMessage());
in = httpConn.getInputStream();
out = new FileOutputStream(new File(outPath));
chByte = in.read();
while (chByte != -1)
{
out.write(chByte);
chByte = in.read();
}
}
catch (MalformedURLException e)
{
}
}
經(jīng)過(guò)一段時(shí)間的研究和查找資料,發(fā)現是由于上面的代碼缺少了一些必要的信息導致,增加更加詳細的屬性
httpConn.setRequestMethod("GET");
httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
完整代碼如下:
public static void DownLoadPages(String urlStr, String outPath)
{
int chByte = 0;
URL url = null;
HttpURLConnection httpConn = null;
InputStream in = null;
FileOutputStream out = null;
try
{
url = new URL(urlStr);
httpConn = (HttpURLConnection) url.openConnection();
HttpURLConnection.setFollowRedirects(true);
httpConn.setRequestMethod("GET");
httpConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
// logger.info(httpConn.getResponseMessage());
in = httpConn.getInputStream();
out = new FileOutputStream(new File(outPath));
chByte = in.read();
while (chByte != -1)
{
out.write(chByte);
chByte = in.read();
}
}
catch (MalformedURLException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
out.close();
in.close();
httpConn.disconnect();
}
catch (Exception ex)
{
ex.printStackTrace();
}
}
}
此外,還有第二種方法可以訪(fǎng)問(wèn)Google的網(wǎng)站,就是用apache的一個(gè)工具HttpClient 模仿一個(gè)瀏覽器來(lái)訪(fǎng)問(wèn)Google
Document document = null;
HttpClient httpClient = new HttpClient();
GetMethod getMethod = new GetMethod(url);
getMethod.setFollowRedirects(true);
int statusCode = httpClient.executeMethod(getMethod);
if (statusCode == HttpStatus.SC_OK)
{
InputStream in = getMethod.getResponseBodyAsStream();
InputSource is = new InputSource(in);
DOMParser domParser = new DOMParser(); //nekoHtml 將取得的網(wǎng)頁(yè)轉換成dom
domParser.parse(is);
document = domParser.getDocument();
System.out.println(getMethod.getURI());
}
return document;
推薦使用第一種方式,使用HttpConnection 比較輕量級,速度也比第二種HttpClient 的快。
轉載一些代碼,使用HttpUrlConnection來(lái)模擬ie form登陸web:
關(guān)于java模擬ie form登陸web的問(wèn)題
HttpURLConnection urlConn=(HttpURLConnection)(new URL(url).openConnection());
urlConn.addRequestProperty("Cookie",cookie);
urlConn.setRequestMethod("POST");
urlConn.setRequestProperty("User-Agent","Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
urlConn.setFollowRedirects(true);
urlConn.setDoOutput(true); // 需要向服務(wù)器寫(xiě)數據
urlConn.setDoInput(true); //
urlConn.setUseCaches(false); // 獲得服務(wù)器最新的信息
urlConn.setAllowUserInteraction(false);
urlConn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
urlConn.setRequestProperty("Content-Language","en-US" );
urlConn.setRequestProperty("Content-Length", ""+data.length());
DataOutputStream outStream = new DataOutputStream(urlConn.getOutputStream());
outStream.writeBytes(data);
outStream.flush();
outStream.close();
cookie=urlConn.getHeaderField("Set-Cookie");
BufferedReader br=new BufferedReader(new InputStreamReader(urlConn.getInputStream(),"gb2312"));
本文出處:
http://www.blogjava.net/fisher/articles/86926.aspx
聯(lián)系客服