$ curl https://secretWebSite.com/

Question

Level 3

590 points

$ curl https://secretWebSite.com/

Dear Wise & Powerful Masters of Technology,

I fear that curl is not down loading my secret web site correctly. I get this:

$ curl https://secretWebSite.com/
<head><title>Object moved</title></head>
<body>

Object Moved

This object may be found here.</body>
$

This is NOT what I expected.

How can I fool the secretWebSite into thinking that my call from curl is from Internet Explorer or Safari?

Your humble geek,

Kurt

ibook, Mac OS X (10.2.x)

Posted on Jul 18, 2006 3:06 PM

Reply

Answer 1

Jul 18, 2006 3:20 PM in response to Kurt Sakaeda

Kurt,

Have you tried the -L option?

-L/--location
(HTTP/HTTPS) If the server reports that the requested page has a
different location (indicated with the header line Location:)
this flag will let curl attempt to reattempt the get on the new
place. If used together with -i/--include or -I/--head, headers
from all requested pages will be shown. If authentication is
used, curl will only send its credentials to the initial host,
so if a redirect takes curl to a different host, it won't inter-
cept the user+password. See also --location-trusted on how to
change this.

If this option is used twice, the second will again disable
location following.

Andy

Reply

Answer 2

Camelot

Level 9

58,581 points

Jul 18, 2006 5:29 PM in response to Kurt Sakaeda

>How can I fool the secretWebSite into thinking that my call from curl is from Internet Explorer or Safari?

If you're sure that the web server is refusing your connection due to user agent filtering, you can use curl's -A switch:

curl -A "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418.8 (KHTML, like Gecko) Safari/419.3" https://secretWebSite.com/

(all one line, of course)

Reply

Answer 3

Jul 18, 2006 5:48 PM in response to Camelot

Camelot,

I curious, how do you go about find what the valid User agent strings are? Is there a listing some where?

And how do you determine which one is needed? will looking at the webpages source code?

Andy

Reply

Answer 4

Jul 18, 2006 5:56 PM in response to Kurt Sakaeda

Hi Kurt,
The HTML your post suggests that loading the page in question involves a redirect but you've made no mention this aspect of this page so unlike Andy, I'm not inclined to discuss that aspect of the use of curl. However, you asked how to convince the server that the request is coming from Safari. (I don't use M$ IE) That involves setting the "User-Agent" field in the HTTP header of the request. That can be accomplished with the -A or --user-agent option of curl. That part of the command would look like:

curl -A 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418.8 (KHTML, like Gecko) Safari/419.3'

The string following the -A option above is the User-Agent string that Safari uses by default.

As an aside, the page you're loading is an https secure page. Curl can negotiate a secure socket with absolutely no problem. However like a browser, curl attempts to verify the server certificate and curl returns an error if the verification fails. Curl allows you to turn off this verification with the -k option.
--
Gary
~~~~
I think there's a world market for about five computers.
-- attr. Thomas J. Watson (Chairman of the Board, IBM), 1943

Reply

Answer 5

Jul 18, 2006 6:32 PM in response to Nils C. Anderson

Hi Andy,
Oops, it looks like Camelot beat me to the User-Agent string by quite a bit. I'm interested in hearing how he did it as well, since his method is clearly simpler than mine. For what it's worth, I just sniffed packets, dumped the HTTP GET request packet as text and copied the User-Agent string from that.

The User-Agent field won't make any difference unless the server is configured to consider it in the formulation of its response. However, I've never done that so I can't really say anything about how it's done. One would assume that the administrator of the server would know if it has been so configured but I suppose there could be options that do this "behind the scenes". However, I was curious about whether that was the case with the secure page I used to test so what I did was to download the page with and without the User-Agent specified and then I compared the results. Since the pages were the same, I assume that the User-Agent made no difference. Obviously if there had been a difference, that would reflect the effect of the User-Agent field.
--
Gary
~~~~
How should I know if it works? That's what beta testers
are for. I only coded it.
-- Attributed to Linus Torvalds, somewhere in a posting

Reply

Answer 6

Camelot

Level 9

58,581 points

Jul 19, 2006 4:34 PM in response to Nils C. Anderson

I curious, how do you go about find what the valid User agent strings are? Is there a listing some where?

Well, Apache will log this data, so one option is to look at your web server logs to find some samples.

Alternatively you can use tcpdump to monitor the traffic on your interface and see the data from there:

sudo tcpdump -i en0 -s 0 -A port 80

This will capture all traffic on port 80 and show the entire packet contents.

Reply

Answer 7

Kurt Sakaeda Author

Level 3

590 points

Jul 19, 2006 7:54 PM in response to Nils C. Anderson

Dear Nils C.

Thank you for the suggestion. When I tried it I got:

$curl -L https://www.emidas.com
curl: (47) Maximum (50) redirects followed

Any change in results is progress.

Your humble geek,

Kurt

Reply

Answer 8

Jul 20, 2006 2:10 PM in response to Kurt Sakaeda

Hi Kurt,
That obviously sounds like an infinite loop. Thus I assume that your server is sending a redirect that points to the same page. It could be bouncing back and forth between two pages but that seems far less likely. If two pages are involved, I would expect it to be bouncing back and forth between the encrypted and unencryted version of the page in question.

To address this you'll have to know a lot more about the communication between the client and the server. Often I suggest sniffing packets but much of the communication in this case will be encrypted. Firefox and Mozilla have a cool extension that you can download from their site called Live HTTP Headers. It only shows you the HTTP headers of the packets but that's actually what you want so it will cut down on the "noise". However, the most important thing is that they are decrypted so this is a case where that might be the only way to get detail.
--
Gary
~~~~
Marriage is the process of finding out what
kind of man your wife would have preferred.

Reply

Answer 9

Jul 20, 2006 8:18 PM in response to Gary Kerbaugh

I've seen that happen when the server sends back something "not quite standard": a more lenient browser can figure out but a more strict HTTP-standards one can't. I've seen it happen when the redirect is to a protocol or other feature that isn't supported: it does the best it can. I've seen it happen when the server requests some sort of authentication or credentials that curl is not configured to handle or doesn't have parameters set to provide. In each case, 'curl -L' winds up trying the same or another "not quite correct" URI, and the server again sends a redirect, which curl mishandles again, etc etc.

Seeing the full header as well as the full result (passing -i) might allow diagnosing of what the "correct" actual access URI should be.

Reply

Answer 10

Jul 19, 2006 5:32 PM in response to Camelot

Thanks Camelot & Gary!

Reply