이전에 URLConnection을 통해 HTML source 중 <title> 정보를 얻어오는 방법에 대한 글을 올린 적이 있습니다.

http://sarc.io/index.php/java/339-get-from-remote-web-page-httpurlconnection

그리고 얼마전 Apache HttpClient 4.5.2 버전에 대한 소개가 있었습니다.

http://sarc.io/index.php/miscellaneous/396-3-apache-news-rave

 

아래는 Apache HttpClient을 이용하여 HTML source를 얻어내는 간단한 클래스입니다.

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
 
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
 
public class Http
{
  private final static String TARGET_URL = "http://apache.org";
 
  public void printHTLMSource() throws ClientProtocolException, IOException
  {
    CloseableHttpClient client = HttpClients.createDefault();
    HttpGet request = new HttpGet(TARGET_URL);
    HttpResponse response = client.execute(request);
 
    System.out.println("- Response Code : "
        + response.getStatusLine().getStatusCode());
 
    BufferedReader br = new BufferedReader(new InputStreamReader(response
        .getEntity().getContent()));
 
    StringBuffer htmlSource = new StringBuffer();
    String line = "";
    while ( (line = br.readLine()) != null )
    {
      htmlSource.append(line);
    }
    System.out.println("- Result : " + htmlSource.toString());
  }
}

 

제가 사용한 라이브러리는 다음과 같습니다.

  • httpclient-4.5.2.jar
  • httpcore-4.4.4.jar
  • commons-logging-1.2.jar