My problem is this:
I want to be able to determine if a website has updated it's content...
Inotherwords,
logo.gif is found on google's server...
http://www.google.com/images/logo.gif
I have another copy of "logo.gif" on my harddrive...
I want to be able to find out if they updated the "logo" so I can download the new one (without downloading the whole thing)... How can this be done?
I've so far just checked the size of the file and then compared it to the size of the picture on my harddrive, but the results are not always consistant...
Is there a way to check the date or the actual size of the file before I download it? Any information would be appreciated...
Sliver
I want to be able to determine if a website has updated it's content...
Inotherwords,
logo.gif is found on google's server...
http://www.google.com/images/logo.gif
I have another copy of "logo.gif" on my harddrive...
I want to be able to find out if they updated the "logo" so I can download the new one (without downloading the whole thing)... How can this be done?
I've so far just checked the size of the file and then compared it to the size of the picture on my harddrive, but the results are not always consistant...
Is there a way to check the date or the actual size of the file before I download it? Any information would be appreciated...
Sliver
Send the GET request to the server and just check the "content-length: " part of the HTTP header, the server responses:
http://www.freesoft.org/CIE/RFC/1945/53.htm
http://www.freesoft.org/CIE/RFC/1945/53.htm
Use HEAD instead of GET. It's exactly like GET but it will only give you the headers, not the data.
There are also some headers like 'if-modified-since' which can give you a special response code ('304 - not modified') if it didn't change, and the whole file if it did change (iirc).
Thomas
There are also some headers like 'if-modified-since' which can give you a special response code ('304 - not modified') if it didn't change, and the whole file if it did change (iirc).
Thomas
Thomas,
using HEAD is it the same syntax? And how is the data terminated? Pair of CR LF, or just a single CR LF?
using HEAD is it the same syntax? And how is the data terminated? Pair of CR LF, or just a single CR LF?
Thomas,
using HEAD is it the same syntax? And how is the data terminated? Pair of CR LF, or just a single CR LF?
(sorry, can't test it here)
using HEAD is it the same syntax? And how is the data terminated? Pair of CR LF, or just a single CR LF?
(sorry, can't test it here)
HTTPHeader = HTTP/1.0 200 OK
Content-Length: 0
Connection: Close
Server: GWS/2.0
Date: Thu, 28 Mar 2002 09:07:50 GMT
Content-Type: text/html
Cache-control: private
Set-Cookie: PREF=ID=71b9be687dcb9758:TM=1017306470:LM=1017306470:S=AiTshn_2z-0; domain=.google.com; path=/; expires=Sun, 17-Jan-2038 19:14:07 GMT
Thanks Thomas (Got this from using HEAD instead of GET)
This is the template I used to send information...
HeadTemplate db "HEAD %s HTTP/1.0",0dh,0ah
db "Host: %s",0Dh,0Ah
db "Range: bytes=%lu-",0Dh,0Ah
db "User-Agent: TesterAgent",0Dh,0Ah
db "Connection: Close",0Dh,0Ah
db "Accept: text/*,image/*,application/*,*/*", 0Dh,0Ah,0dh,0ah,0
2 more question though...
1) Is there a way to get anymore information about a file?
2) Does it only give the date it was modified like that? Or can I get it some other way like 032802-09:07:50GMT?
If anyone has parsed the Date line in the header... please help me out... I really don't want to have to create a parser for that
:):):)
Sliver
----EDIT----
Found out that the response is actually the same (except for the content-length) -- Not sure why this is
;USING HEAD
HTTPHeader = HTTP/1.0 200 OK
Connection: Close
Server: GWS/2.0
Date: Thu, 28 Mar 2002 09:17:03 GMT
Last-Modified: Wed, 28 Nov 2001 20:50:08 GMT
Content-Type: image/gif
Content-Length: 0
Expires: Sun, 17 Jan 2038 19:14:07 GMT
;Using GET
HTTPHeader = HTTP/1.0 200 OK
Connection: Close
Server: GWS/2.0
Date: Thu, 28 Mar 2002 09:15:46 GMT
Last-Modified: Wed, 28 Nov 2001 20:50:08 GMT
Content-Type: image/gif
Content-Length: 8558
Expires: Sun, 17 Jan 2038 19:14:07 GMT
Bazik:
The format is exactly the same as with GET, just write HEAD instead of GET. It's a HTTP request, and HTTP requests always end with a blank line so there are two CRLFs at the end (last line terminator and blank line).
Maybe there are some headers for this but you won't get much more from the server. What data do you want to have?
I don't have code for this but there's probably some C code around..
It is supposed to be the same, the difference is that with HEAD, there isn't any content sent to you.
Thomas
The format is exactly the same as with GET, just write HEAD instead of GET. It's a HTTP request, and HTTP requests always end with a blank line so there are two CRLFs at the end (last line terminator and blank line).
1) Is there a way to get anymore information about a file?
Maybe there are some headers for this but you won't get much more from the server. What data do you want to have?
2) Does it only give the date it was modified like that? Or can I get it some other way like 032802-09:07:50GMT?
I don't have code for this but there's probably some C code around..
Found out that the response is actually the same (except for the content-length) -- Not sure why this is
It is supposed to be the same, the difference is that with HEAD, there isn't any content sent to you.
Thomas
Thanks, Thomas!
It is supposed to be the same, the difference is that with HEAD, there isn't any content sent to you.
Thomas
Hmm... but the problem was, that he wanted to get the size of the file. Head just sents the header with "content-length: 0".
One way would be to send a GET with a small receiving buffer (32 bytes?). And stop receiving after you passed the CR LF pair.
It is supposed to be the same, the difference is that with HEAD, there isn't any content sent to you.
Thomas
Hmm... but the problem was, that he wanted to get the size of the file. Head just sents the header with "content-length: 0".
One way would be to send a GET with a small receiving buffer (32 bytes?). And stop receiving after you passed the CR LF pair.
Hmm... but the problem was, that he wanted to get the size of the file. Head just sents the header with "content-length: 0".
One way would be to send a GET with a small receiving buffer (32 bytes?). And stop receiving after you passed the CR LF pair.
Sorry I missed that... It probably means that the server can't determine the content's size without giving the data.. You could break the connection after the GET headers, isn't a very nice solution but it works..
If the server supports if-modified headers that would be better to use.
Thomas
When I send this:
This is responded:
(without further data)
When I send this:
This is returned:
Including the full image.
Thomas
GET /images/logo.gif HTTP/1.1
Host: [url]www.google.com[/url]
Connection: close
If-Modified-Since: Tue, 28 Mar [b]2002[/b] 12:00:00 GMT
This is responded:
HTTP/1.1 304 Not Modified
Content-Length: 0
Connection: Close
Server: GWS/2.0
Content-Type: text/html
Date: Thu, 28 Mar 2002 14:28:02 GMT
(without further data)
When I send this:
GET /images/logo.gif HTTP/1.1
Host: [url]www.google.com[/url]
Connection: close
If-Modified-Since: Mon, 28 Mar [b]1994[/b] 12:00:00 GMT
This is returned:
HTTP/1.1 200 OK
Connection: Close
Server: GWS/2.0
Date: Thu, 28 Mar 2002 14:32:17 GMT
Last-Modified: Wed, 28 Nov 2001 20:50:08 GMT
Content-Type: image/gif
Content-Length: 8558
Expires: Sun, 17 Jan 2038 19:14:07 GMT
[DATA]
Including the full image.
Thomas
Nice!
This solved Silver's first question. But what about the second one?
Is there a way to check the date or the actual size of the file before I download it?
Is there a way to get the size without sending a GET and breaking after the headers?
This solved Silver's first question. But what about the second one?
Is there a way to check the date or the actual size of the file before I download it?
Is there a way to get the size without sending a GET and breaking after the headers?
It's up to the server to include or to leave out the filesize. If it can't determine the size beforehand (like with non-buffered PHP pages), it won't give you the size.
In response to a HEAD request, the server should give you the size but obviously it doesn't.
You could try byte ranges (Range: bytes=0-10) to get only a few bytes but it's unlikely that the server will support this for normal content like images or html.
But Sliver's problem is solved with the if-modified header.
Thomas
In response to a HEAD request, the server should give you the size but obviously it doesn't.
You could try byte ranges (Range: bytes=0-10) to get only a few bytes but it's unlikely that the server will support this for normal content like images or html.
But Sliver's problem is solved with the if-modified header.
Thomas
Thanks Thomas -- and Thanks for the follow questions bAZiK...
Sliver
Sliver