Need help in getting HTML of a website in Java -


I've got some code and I'm pretty much the same code to get html from websites in Java. Except for a particular website that I am unable to work with this code:

I am trying to get HTML from this website:

But I continue the junk characters However it works well like any other website.

And this is the code I am using:

  public static string print HTML () {URL url = null; Try {url = new URL ("http://www.geni.com/genealogy/people/William-JEFerson-Blythe-Clinton/6000000001961474289"); } Catch (MalmarmdoorLeuxception E1) {// Tudo Auto-Generated Catch Block e1.printStackTrace (); } HttpURL Connection Connection = Faucet; Try {connection = (HttpURL connection) url.openConnection (); } Grip (IOException E) {// TODO Auto-Generated Catch Block e.printStackTrace (); } Connection.setRequestProperty ("User-agent", "Mozilla / 5.0 (Windows; U; Windows NT 6.1; N-US; RV .: 1.9.2.6) Geico / 20100625 Firefox / 3.6.6"); Try {system.out.println (connection.getResponseCode ()); } Grip (IOException E) {// TODO Auto-Generated Catch Block e.printStackTrace (); } String line; Stringbuilder builder = new stringbiller (); Buffer Reader = Faucet; {Reader = new BufferedReader (try New InputStreamReader (connection.getInputStream ());} Hold (IOException E) {// TODO Auto-Generated Catch Block e.printStackTrace ();} while trying {(Line = Reader) Robil ())! = Null; {builder.append (line); builder.append ("\ n");}} hold (IOException e) {// TODO auto-generated cal block e.printStackTrace ();} String HTML = Builder.trusting (); System.out.println ("HTML" + html); HTML Return;}  

I do not understand that this is the URL mentioned above Why not work with it.

Any help

This site is gossiping feedback regardless of client's capabilities whenever the client (Usually) a server should only grip the response. You need to unzip it by using it.

  Reader = New BufferedReader (New InputStreamRadder (New GZIPInputStream ( Connection.getInputStream ()), "UTF-8");  

Note that I also added the correct charset for the InputStreamReader constructor. Generally you want to remove it from the header of the response.

For more signals, also see if you want to parse / remove the information from HTML, whatever you want after all, I strongly recommend that you use a Jasup Use the.


Comments

Popular posts from this blog

Eclipse CDT variable colors in editor -

AJAX doesn't send POST query -

wpf - Custom Message Box Advice -