Hello community :-)

don't know how to say it...I'm a bit depressed.
working my way with your help to a functional tool, but now there are the next mysterious problems i don't understand.

tool works now like i want, read full HTML page, search for all links inside and show them in a listbox.
so i should happy, but I'm not :-(

inside the links and a test text file (i let wrote from the received HTML data) are black boxes, and other chars are now wrong.
don't know how to explain it so i attach the source and if you like, you can see the problem by yourself.

whats wrong ?
do i have to program another way to read the HTML page ?
(i test if listbox can not show , but it can so must be my mistake)
(when i open the saved txt file in edit, there are the black boxes, if open in wordpad there are no black boxes but in booth the and spaces are wrong)

since my english (lucky spell check here) is bad as my coding:-) i don't know for what i should search the archiv here.

thank you for helping me :-)
Posted on 2006-04-29 12:25:13 by xanthos
Make sure that the listbox' encoding is the same as html's encoding. The easiest way to do this is to use unicode, because ANSI is too codepage-dependent, which may cause many bugs.

I don't know if that's the case, but when you deal with internet it's a good habit to use unicode, so you can support every national character (including chinese, japanese, etc. characters)
Posted on 2006-04-29 13:00:11 by ti_mo_n
Hmm, black boxes etc... this would be for "national" characters, right? Ie, non-ASCII/ANSI ones, like and the likes? This might have to do with the encoding used on the website.

Sometimes, those national characters will use a HTML code like &aelic; (or similar) - that will show up fine. Other times, the page will be formated in UTF-8, which uses multiple bytes per character. If you try to open such a page in an ASCII/ANSI app (like your program, or the win9x version of notepad, or NT notepad on files without a BOM), you'll probably get those "black boxes".

Seems like your little quest is turning into a big adventure :) - UNICODE can be a complicated issue.
Posted on 2006-04-29 13:05:44 by f0dder
Well, I saw no such characters, so I assume that:
1- you use WinInet to perform the web fetch
2- you use 'your browser's default settings' to do so
3- your browser has some god-awful default mimetype

Would you mind sniffing and posting the http request your app is making to confirm this?
Posted on 2006-04-30 03:09:47 by Homer
I do see those things here, and it does look like some utf-8 encoded text. Also, your parser has a few hiccups :)
Posted on 2006-04-30 03:15:43 by f0dder
good morning,
don't know what happend, I'm sure i post yesterday in the late night a posting here.
but today i see its not there *wonder*

so i say it again :-)
thank you for directing me again to the right way !

yes thats the problem, thank you.
i now browse the archiv for unicode and found some samples, now i had to understand them and i will try to solve the problem.
also i read some html tutorials and found out "%20" is as example the unicode for a " " space.

also great luck i found this: InStringx
So my little tool can now sort out the foundet links from a definet hoster, like rapidshare :-)

yes i start progaming without thinking what all is needet to do, just the way programing of a beginner :-)
i dont know people who use asm in my place, im from germany, they use c,vb,etc etc and tell me i waste my time learning a bit asm.
but i personal fell more then a programer if i try to learn asm then when i see how the "click click" there selfmade programs.

Also, your parser has a few hiccups

hm, what made i wrong ?
is the sometime wrong linkresult not from the unicode text ?
i think if i use unicode those wrong results where solved, had to test this.
sorry your example i had not  understand full so i dont use it, dont like copy/paste things i dont understand and then "bum" :-)
im not really shure what it makes, but for now i think:
search a defined string in the definded buffer and give out the position of the foundet string.
but then when i search the next one, i had to send the last foundet position to search the next entry ?
so that i dont have a loop and find everytime the same string.
as you see, im at the beginning to understand how it works and had not full test it.
i say the true when i say, i look at it a long time, go from line to line and think in mind what will happend.
but i dont compile it and test it yet, want first to understand what happend before i copy/paste.
dont know what you think about this, but im shure you also dont copy/paste code directly without understanding it.

but as you see, im willed to learn and use now a shorter way to parse the links.
thanks for your hint to .while  8)

so bad that is today sunday, i cant program anything, my girlfriend need also some time with me  :shock: but i whill try at monday evening to solve the unicodes and post results here.

wish you all a nice sunday, sun is shining here  8)

ps: the speel check is a bit wirred from time to time :-)
Posted on 2006-04-30 03:41:52 by xanthos