My problem is this:

I want to be able to determine if a website has updated it's content...

Inotherwords,

logo.gif is found on google's server...

http://www.google.com/images/logo.gif

I have another copy of "logo.gif" on my harddrive...

I want to be able to find out if they updated the "logo" so I can download the new one (without downloading the whole thing)... How can this be done?

I've so far just checked the size of the file and then compared it to the size of the picture on my harddrive, but the results are not always consistant...

Is there a way to check the date or the actual size of the file before I download it? Any information would be appreciated...

Sliver
Posted on 2002-03-28 01:34:57 by Sliver
Send the GET request to the server and just check the "content-length: " part of the HTTP header, the server responses:

http://www.freesoft.org/CIE/RFC/1945/53.htm
Posted on 2002-03-28 01:42:56 by bazik
Use HEAD instead of GET. It's exactly like GET but it will only give you the headers, not the data.
There are also some headers like 'if-modified-since' which can give you a special response code ('304 - not modified') if it didn't change, and the whole file if it did change (iirc).

Thomas
Posted on 2002-03-28 02:44:28 by Thomas
Thomas,
using HEAD is it the same syntax? And how is the data terminated? Pair of CR LF, or just a single CR LF?
Posted on 2002-03-28 02:51:42 by bazik
Thomas,
using HEAD is it the same syntax? And how is the data terminated? Pair of CR LF, or just a single CR LF?
(sorry, can't test it here)
Posted on 2002-03-28 02:51:59 by bazik


HTTPHeader = HTTP/1.0 200 OK
Content-Length: 0
Connection: Close
Server: GWS/2.0
Date: Thu, 28 Mar 2002 09:07:50 GMT
Content-Type: text/html
Cache-control: private
Set-Cookie: PREF=ID=71b9be687dcb9758:TM=1017306470:LM=1017306470:S=AiTshn_2z-0; domain=.google.com; path=/; expires=Sun, 17-Jan-2038 19:14:07 GMT


Thanks Thomas (Got this from using HEAD instead of GET)
This is the template I used to send information...



HeadTemplate db "HEAD %s HTTP/1.0",0dh,0ah
db "Host: %s",0Dh,0Ah
db "Range: bytes=%lu-",0Dh,0Ah
db "User-Agent: TesterAgent",0Dh,0Ah
db "Connection: Close",0Dh,0Ah
db "Accept: text/*,image/*,application/*,*/*", 0Dh,0Ah,0dh,0ah,0


2 more question though...
1) Is there a way to get anymore information about a file?
2) Does it only give the date it was modified like that? Or can I get it some other way like 032802-09:07:50GMT?

If anyone has parsed the Date line in the header... please help me out... I really don't want to have to create a parser for that
:):):)

Sliver

----EDIT----

Found out that the response is actually the same (except for the content-length) -- Not sure why this is



;USING HEAD
HTTPHeader = HTTP/1.0 200 OK
Connection: Close
Server: GWS/2.0
Date: Thu, 28 Mar 2002 09:17:03 GMT
Last-Modified: Wed, 28 Nov 2001 20:50:08 GMT
Content-Type: image/gif
Content-Length: 0
Expires: Sun, 17 Jan 2038 19:14:07 GMT

;Using GET
HTTPHeader = HTTP/1.0 200 OK
Connection: Close
Server: GWS/2.0
Date: Thu, 28 Mar 2002 09:15:46 GMT
Last-Modified: Wed, 28 Nov 2001 20:50:08 GMT
Content-Type: image/gif
Content-Length: 8558
Expires: Sun, 17 Jan 2038 19:14:07 GMT
Posted on 2002-03-28 03:14:04 by Sliver
Bazik:
The format is exactly the same as with GET, just write HEAD instead of GET. It's a HTTP request, and HTTP requests always end with a blank line so there are two CRLFs at the end (last line terminator and blank line).

1) Is there a way to get anymore information about a file?

Maybe there are some headers for this but you won't get much more from the server. What data do you want to have?

2) Does it only give the date it was modified like that? Or can I get it some other way like 032802-09:07:50GMT?

I don't have code for this but there's probably some C code around..

Found out that the response is actually the same (except for the content-length) -- Not sure why this is


It is supposed to be the same, the difference is that with HEAD, there isn't any content sent to you.

Thomas
Posted on 2002-03-28 03:34:27 by Thomas
Thanks, Thomas!




It is supposed to be the same, the difference is that with HEAD, there isn't any content sent to you.

Thomas


Hmm... but the problem was, that he wanted to get the size of the file. Head just sents the header with "content-length: 0".
One way would be to send a GET with a small receiving buffer (32 bytes?). And stop receiving after you passed the CR LF pair.
Posted on 2002-03-28 08:14:48 by bazik

Hmm... but the problem was, that he wanted to get the size of the file. Head just sents the header with "content-length: 0".
One way would be to send a GET with a small receiving buffer (32 bytes?). And stop receiving after you passed the CR LF pair.


Sorry I missed that... It probably means that the server can't determine the content's size without giving the data.. You could break the connection after the GET headers, isn't a very nice solution but it works..
If the server supports if-modified headers that would be better to use.

Thomas
Posted on 2002-03-28 08:23:55 by Thomas
When I send this:


GET /images/logo.gif HTTP/1.1
Host: [url]www.google.com[/url]
Connection: close
If-Modified-Since: Tue, 28 Mar [b]2002[/b] 12:00:00 GMT


This is responded:


HTTP/1.1 304 Not Modified
Content-Length: 0
Connection: Close
Server: GWS/2.0
Content-Type: text/html
Date: Thu, 28 Mar 2002 14:28:02 GMT

(without further data)

When I send this:


GET /images/logo.gif HTTP/1.1
Host: [url]www.google.com[/url]
Connection: close
If-Modified-Since: Mon, 28 Mar [b]1994[/b] 12:00:00 GMT


This is returned:


HTTP/1.1 200 OK
Connection: Close
Server: GWS/2.0
Date: Thu, 28 Mar 2002 14:32:17 GMT
Last-Modified: Wed, 28 Nov 2001 20:50:08 GMT
Content-Type: image/gif
Content-Length: 8558
Expires: Sun, 17 Jan 2038 19:14:07 GMT

[DATA]

Including the full image.

Thomas
Posted on 2002-03-28 08:35:20 by Thomas
Nice!
This solved Silver's first question. But what about the second one?


Is there a way to check the date or the actual size of the file before I download it?


Is there a way to get the size without sending a GET and breaking after the headers?
Posted on 2002-03-28 08:47:16 by bazik
It's up to the server to include or to leave out the filesize. If it can't determine the size beforehand (like with non-buffered PHP pages), it won't give you the size.
In response to a HEAD request, the server should give you the size but obviously it doesn't.
You could try byte ranges (Range: bytes=0-10) to get only a few bytes but it's unlikely that the server will support this for normal content like images or html.
But Sliver's problem is solved with the if-modified header.

Thomas
Posted on 2002-03-28 09:09:47 by Thomas
Thanks Thomas -- and Thanks for the follow questions bAZiK...

Sliver
Posted on 2002-03-28 12:12:43 by Sliver