hi i would like to know if anyone could point me in the direction of some documentation on the HTTP protocol and how to request a file and then DL it. so far this is what i have done: connected to port 80 of the host and have received the FD_WRITE Message (Ready to write) and have sent the request. however i am not sure if my request is correct. i have tried to figure it out from iczelions http15 downloader thing but its quite long and complicated :( what should the request be? request db "GET http://www.domain.com/index.html HTTP/1.0",0dh,0ah or just: request db "GET index.html HTTP/1.0",0dh,0ah or with the "/" before index.html or what? after i have sent the request then i wait for the FD_READ message and recv the data?? skud.
I'm not sure but it should be documented in the HTTP RFC. See http://www.ietf.org for more info.
hi, for getting the "right" index file, you must be sure of the extension (htm or html) before doing a GET /index.htm HTTP/1.0 or GET /index.html HTTP/1.0. Otherwise it may result to a 404notfound error. The right way to do this is to do a "GET / HTTP/1.0" to have the server fetch the right index file for you. Every request must be preceeded with a "/" or the path of file you want to get. And take note, just to be safe, every request must match the case of the filename of the file that you want to get, bacause the case matters to apache or any *nix-based HTTP server. Some of the minor stuff varies with different HTTP servers, I agree with karim that it is recommended that you read the HTTP RFC :). Also, I strongly recommend you get NetCat (www.l0pht.com/~weld/netcat). Pretty cool tool for experimenting with protocols. happy coding - clip
The HTTP protocol is not really hard. The request/reply sequence is text based so it's easy to parse. If the only thing you want is download a file you can send some standard headers like this:
Then you get the response:
GET /folder/file.ext HTTP/1.1 Host: www.hostname.com Accept: */* Connection: close
The XXX is the response code (text is the text version of the code). If the code is 200 (OK), you can receive the data immediately. It's harder when you get a 3xx code, as those codes indicate the file has moved somewhere else (probably a redirection). The response headers contain some useful info, like Content-Size, the size of the file you're downloading, the new location for a 3xx code, etc. I strongly suggest you read the RTF on the HTTP protocol. You don't have to read the whole thing, just the parts that are important. Thomas This message was edited by Thomas, on 6/2/2001 10:39:13 AM
HTTP/1.1 XXX text Responseheader1: responsedata1 Responseheader2: responsedata2 Responseheader3: responsedata3 Responseheader4: responsedata4 Actual data here
thanks for the resposnse! i now understand how the request is sent and am sending the request ... but i don't receive the FD_READ message after i have sent it to receive the response ..?? why is this? here is the request i am sending: request db "GET /index.html HTTP/1.0",0dh,0ah The number of bytes to send is correct and it is the right case. When i put it after the ip in my browser it goes to the page so the request is ok, right? i have also had a look around for the HTTP RTF you talk of (or RFC - any difference?) and i can't find it anywhere. (i came up with some interesting results involving britney speares when searching on yahoo lol). if anyone knows of a link then that would be great. -clip i have had a look at the NetCat site. it looks like a great tool but unfortunately i am using Windows 98 (Second Edition) and it is only for Unix / Linux and NT. skud. This message was edited by skud, on 6/2/2001 5:58:29 PM
Sorry of course I meant RFC, I just didn't noticed I used the wrong acronym :rolleyes:. You can find them all at rfc-editor.org. rfc2616 is the proposed standard for http1.1. About the FD_READ message: Did you include it in the mask set by WSAAsyncSelect? Edit: I misread a thing in your post. Your request is not complete. The request should be ended with a CRLF on a single line, and right now you only have the CRLF as end of the line. The server thinks your request is not finished yet and will not send anything. That's why you don't get the FD_READ message. So it should be something like:
But most servers will not accept this request because they need more headers. A general request that works most of the time is:
GET /file.ext HTTP/1.1 <-- this one indicates end of request
Most servers require the request above as the minimum. Connection:close means that the connection will be closed when all the data is sent. Accept: */* indicates that the client accepts all mime type. The name after host: should be the website domain name where you download from. Thomas This message was edited by Thomas, on 6/2/2001 6:28:41 PM This message was edited by Thomas, on 6/2/2001 6:30:45 PM
GET /file.ext HTTP/1.1 Connection: close Accept: */* Host: www.hostname.com
skud: "i have had a look at the NetCat site. it looks like a great tool but unfortunately i am using Windows 98 (Second Edition) and it is only for Unix / Linux and NT." -the site I posted has the windows port of netcat. it was ported to win32 by weld pond of l0pht. *just an addition, if my memory serves me right, -> CR LF = Carriage Return Line Feed Thomas: "GET /file.ext HTTP/1.1 <-- this one indicates end of request But most servers will not accept this request because they need more headers." -I beg to disagree with that Thomas. It is the least possible request, though not really proper :). Once I've made a CGI scanner, it only gives out that single line as the command passed to an HTTP server, I tested it on various commonly used HTTP servers (apache, IIS..)and so far it never has failed me. - clip This message was edited by clip, on 6/2/2001 7:40:51 PM
did you guys already tried to use wininet.dll? this magic dll make all http related things really easy you open a url as a file, read from it, close it but it dont exists for w95 without internet explorer, i heard...
clip: I know it's a valid request, but I have never seen a server that also accepts it. Most servers really want at least the host header. With only that added it will work, but it's not that hard to add a few extra headers, just to be on the safe side. Maybe CGI scripts don't require those headers, I wouldn't know, but for me it doesn't work without the headers. Example, www.altavista.com with GET / HTTP/1.1:
(with host: www.altavista.com added it does work) vecna: You can probably use those functions, but I like to have full control over what's happening, and the protocol is quite simple. Thomas This message was edited by Thomas, on 6/3/2001 2:50:31 AM This message was edited by Thomas, on 6/3/2001 2:52:18 AM
HTTP/1.1 400 HTTP/1.1 requires host Server: Resin/1.2.1 Expires: Thu, 01 Dec 1994 16:00:00 GMT Content-Type: text/html Content-Length: 158 Date: Sun, 03 Jun 2001 06:52:13 GMT
400 HTTP/1.1 requires host
400 HTTP/1.1 requires host
Resin 1.2.1 -- Mon Dec 4 10:01:01 PST 2000
Thomas: I have tried connecting to altavista.com and sending "GET / HTTP/1.1" and it did work as predicted, it gave me the index. I'm no expert but I've already done this before, when I tested my scanner on different HTTP servers, looking for blacklisted CGIs and HTMs... if you've seen most of the code of CGI scanners floating around the net, those coded by experts and kiddies, you'll see a pattern there, they only send one line of command proceeded with 2 crlfs and didn't have the hostname included. And i've tried many of these apps on different servers as alot of people, and servers replied properly to the scan. That one line without the hostname is really the simplest way to get any file out of an HTTP server. Though, I agree with you to add headers just to be safe :). does your browser connect to a proxy to surf? - clip
How did you test it then? I don't use a proxy but I also don't use my browser to test it.. If I execute telnet www.altavista.com 80. Then type:
, I get a 400 response (requires host). What do you use to test it? Thomas
GET / HTTP/1.1
Thomas: I found the problem, it's with the HTTP version :(, try GET / HTTP/1.0 ... it would work. We were both doing different things and I thought it was just a version and it didn't matter, I should've tried your command line sooner... what a misunderstanding... :) cheers, - clip This message was edited by clip, on 6/3/2001 5:03:00 AM
You're right, it works with 1.0. Maybe v1.0 of the protocol doesn't require it... Thomas
Please help me. i need to implement download resume function to my existing and working HTTP program. I think that only thing I have to do is form HTTP request in proper way. I don't now how!!!! Here is what i have done. I looked on Iczelion's HTTP example but I am an beginner and it is too complicated for me. I don't know what to do next or what is wrong. .DATA request db "GET %s HTTP/1.0",13,10,13,10,0 .CODE invoke connect, oursocket, addr hostinfo, SIZEOF sockaddr_in ;connect the socket invoke wsprintf, addr databuffer, addr request, addr url ;format the http request invoke send, oursocket, addr databuffer, eax, NULL ;send the http request Thanx Radim. Czech Republic
First of all you'll have to use http 1.1 as 1.0 does not support partial content. Then get rfc2616 from rfc-editor.org and take a look at the range header (14.35 of rfc2616). With this header you can specify a certain range of bytes you want to retrieve. like:
If the response is 206 (partial content), the partial content is sent back. If it's 200, partial content is not supported or not allowed, and the full file is sent. But it's all explained in the RFC. Thomas
I have read the (14.35 of rfc2616) article!!! but I don't understand it. Simple situation I have binary file on server which has 300 KB. An I downloaded 152 KB so far. I need to make HTTP request to download rest of file. How to perform this. I realized that I have to use HTTP 1.1 !!! Can You please write this bit of code for me?
Sorry I can't code it for you and I've already been writing HTTP stuff the whole day (for my file transfer program). But it's not that hard to write it: Assuming 152KB is 152*1024=155648 bytes, you need everything from byte # 155648 to the end. So the request will be:
the 155648- means 'from byte #155648 till the end of the file'. If everything's ok, you get this response:
GET /url HTTP/1.1 Host: www.hostname.com Accept: */* Connection: close Range: bytes=155648-
It's quite easy to do this, just send the request and receive the response. Thomas
HTTP/1.1 206 Partial content Connection: close etc: etc Actual data comes here
some very useful info here. i have the rfc2616.txt and have begun to read it but its gonna take a while cos for some reason it comes out all weird on my printer :( from what you have told me and from what i have learned from this document my request shoud look like this: request db "GET /index.html HTTP/1.1",0Dh,0Ah db "Connection: close",0Dh,0Ah db "Accept: */*",0Dh,0Ah db "Host: www.napster.com",0Dh,0Ah db "",0 - im just using napster cos its the first thing i thought of. but i am still having the same problem - not receiving the FD_READ message ... i have the final CRLF and my send is returning the amount of bytes that there are there ... so surely there should be some response, even if its an error but there is nothing. please help (im sure you will) thx. skud.
Have you set the network event mask correctly with WSAAsyncSelect? Could you post a piece of your code? Thomas btw I got the rfc as pdf file, from rfc-editor iirc, but I don't know where from exactly. This message was edited by Thomas, on 6/3/2001 2:32:20 PM
so theres nothing wrong with my request?? i receive all other messages when i should so i think WSAASyncSelect is set ok. invoke WSAAsyncSelect,socketdesc,hwnd,WM_SOCKET,FD_CONNECT + FD_READ + FD_WRITE + FD_CLOSE invoke send,socketdesc,ADDR request,SIZEOF request,0 i cut and pasted WSAAsyncSelect from a source that i made that is fully working so i cant see what the problem is. im pretty sure its the request. skud.