이전에 URLConnection을 통해 HTML source 중 <title> 정보를 얻어오는 방법에 대한 글을 올린 적이 있습니다.
/index.php/java/339-get-from-remote-web-page-httpurlconnection
그리고 얼마전 Apache HttpClient 4.5.2 버전에 대한 소개가 있었습니다.
/index.php/miscellaneous/396-3-apache-news-rave
아래는 Apache HttpClient을 이용하여 HTML source를 얻어내는 간단한 클래스입니다.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
public class Http
{
private final static String TARGET_URL = "http://apache.org";
public void printHTLMSource() throws ClientProtocolException, IOException
{
CloseableHttpClient client = HttpClients.createDefault();
HttpGet request = new HttpGet(TARGET_URL);
HttpResponse response = client.execute(request);
System.out.println("- Response Code : "
+ response.getStatusLine().getStatusCode());
BufferedReader br = new BufferedReader(new InputStreamReader(response
.getEntity().getContent()));
StringBuffer htmlSource = new StringBuffer();
String line = "";
while ( (line = br.readLine()) != null )
{
htmlSource.append(line);
}
System.out.println("- Result : " + htmlSource.toString());
}
}
제가 사용한 라이브러리는 다음과 같습니다.
- httpclient-4.5.2.jar
- httpcore-4.4.4.jar
- commons-logging-1.2.jar