Hello ASMCommunity.  As the subject states, I am having problems with Protected Mode setup, more specifically with the 'after routines' of the GDT.  My main problem lies with the act of synchronizing the data segment register(s) with the data descriptor that I have, as well as the code segment via a far jump in relation to my code descriptor in the table.  The odd thing is, that my code executes code after the LGDT instruction such as a message printing routine, and then enables protected mode.

Here, I am stuck with none of my usual debugging options ... printing a message to the screen to see where the trouble part of the code is, Protected Mode having no Interrupts in normal operation.

Could the problem be that I have the LGDT 'nested' in it's own separate function which is called by the main program and then returns to it.  I seriously question this however, as the following print function works just fine.

Here is the GDT setup scrap of my code:
gdtset:		;function to setup and load the Global Descriptor Table
xor ax, ax ;clear ax register
mov es, ax
mov si, gdt ;start of GDT table into SI register
mov di, ;locate GDT at 500h in memory
mov cx, ;size of the GDT (defined my fancy footwork)
push ds ;remember pre-modified DS register
mov ds, ax

cld ;clear the direction flag
rep movsb ;move byte from DS:SI to ES:DI

pop ds ;restore pre-modified DS register

lgdt ;load the Global Descriptor Table

lea si, ;load the address of the GDT Setup Message into SI
call prntstr ;print the string
ret



Just to give a little more information on where everything is at...
    7C00  ->  7E00    :Bootsector
    7E00  ->  79FF      :Bootloader    (this code)
    9200  ->  _undetermined_    :Kernel

    79FF  ->  7BFF    :Stack    (extending down from 7BFF obviously)
    500    ->  _undetermined_    :GDT


You probably should have access to the full source code in it's actual context, so I am providing an attachment file.

For months now, this has baffled me.  It is just so bizarre to me, when much of the online tutorials use almost the exact same setup.
Attachments:
Posted on 2010-01-18 09:58:54 by XeonX369
I've skimmed through and have noticed five things.

First, and not of great importance, you are setting the "accessed" flag for your GDT entries (0x9B & 0x93 instead of 0x9A & 0x92). This is not necessary as the processor will set this flag upon use.

Second, and of relative importance, your ORG statement doesn't seem to reflect an actual load address. jmp code:.bootkrnl is probably jumping closer to the bottom (0x0000) of memory than up near 0x7XXX. The same goes for lgdt or any other absolute address reference.

Third, and of great importance, your stack (0x7BFF) is set to be immediately above your code. For sanity reasons, it should be moved further up or down. Moving the stack down to 0x7000 will give you enough breathing room to do what you need to do, but should fail out fairly reliably in case there is some nasty stack corruption. Also, make sure you align your stack on a DWORD boundary, e.g. 0x7C00-0x7A00 instead of 0x7BFF-0x79FF. Remember, a push will decrement and then move, and a pop will move and then increment.

Fourth, and of greater importance, I don't see a demonstration of the the segment registers being loaded immediately after the jump to protected mode.

Fifth, and of greatest importance, for maximum stability, there should be no calls/rets, stack usage or anything else between setting the Protected Mode (#PE) flag in CR0 and jumping to 32-bit protected mode. You should inline pmoden.

Here is an example of the fourth and fifth points...



;... some RM init code...
 mov eax,cr0
 or al,1
 mov cr0,eax
 jmp GDT_CODE:.PROTECTED_MODE


PROTECTED_MODE:
 mov ax,GDT_DATA
 mov ds,ax
 mov es,ax
 mov fs,ax
 mov gs,ax
 mov ss,ax
  mov eax,STACK_BASE
 mov esp,eax
 mov ebp,eax
;... some PM init code...
jmp KERNEL
Posted on 2010-01-18 12:13:59 by SpooK
Hello Spook and thank you for replying.  To start off, I took your advice for the accessed flag in the GDT entries and decided to change them for minor 'aesthetic' reasons, and I thank you for pointing that out.

With the ORG directive, I purely didn't realize that this should be an actual address, and was told on another forum site (OSDev, I believe) that I could get away with an offset from where it was loaded into (7E00) and that it was recommended.  I suppose that this wasn't the most wise thing, now that you give an explanation.

The double ward alignment does make much sense with me, and I now have my stack set at 7C00 expanding downwards.  Please let me know if this is an acceptable position as I have had issues with the stack concept in the past (mostly positioning).  I kept my SP set to 0200h to denote a 510-byte stack.  Is this a good size?

I realize that I should have inserted some register 'refreshing' code, but I was a bit over-zealous with the 'undo' function this morning and must have scrapped this in the version that I uploaded.  I have promptly re-inserted it.  I load the Data Segment registers with a 'variable' in the GDT section which is specified simply as 'data.' 
pmoden:			;inline function to enable Protected Mode
cli ;permanently clear interrupts
mov eax, cr0 ;move contents of ControlRegister0 into ExtendedAX register
or al, 01h ;compare AL to value 1
mov cr0, eax ;move CR0 into EAX register

jmp code:.pmode ;jump to 32 bit code to straighten out EIP


.pmode
mov ax, data ;align all data segments with the data descriptor
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax

mov ax, 7C00h ;align stack segment
mov ss, ax
mov eax, 0200h ;align base and stack pointers
mov esp, eax
mov ebp, eax

jmp code:9200h ;jump to residence of Kernel
hlt ;halt processor from executing this binary


The frustrating thing is that this version of the code does not want to even boot.  I can't get it to go in Qemu or Bochs, and it won't run on my PC hardware off of floppy.  I am still stupefied at it, and wish more help.  Am I perhaps loading the segment registers incorrectly.

Also as an aside question, Do you recommend setting up a comprehensive segmentation layout for 'General Code/Data' , Kernel Code/Date, Stack, Bootloader/Bootsector, and Descriptor Tables?  Would this just be 'junk' with the implementation of Paging, or would it be even better to do alongside Paging?

Many, Many Thanks for your assistance.
Posted on 2010-01-18 14:27:31 by XeonX369

With the ORG directive, I purely didn't realize that this should be an actual address, and was told on another forum site (OSDev, I believe) that I could get away with an offset from where it was loaded into (7E00) and that it was recommended.  I suppose that this wasn't the most wise thing, now that you give an explanation.


If CS and DS was 0x7E00, and the IP was 0x0000, then using ORG 0x0000 should work for your loader... while in Real Mode that is. Most people enable Protected Mode as one big flat address space (one big segment) instead of using legacy segmentation techniques.


The double ward alignment does make much sense with me, and I now have my stack set at 7C00 expanding downwards.  Please let me know if this is an acceptable position as I have had issues with the stack concept in the past (mostly positioning).  I kept my SP set to 0200h to denote a 510-byte stack.  Is this a good size?

I realize that I should have inserted some register 'refreshing' code, but I was a bit over-zealous with the 'undo' function this morning and must have scrapped this in the version that I uploaded.  I have promptly re-inserted it.  I load the Data Segment registers with a 'variable' in the GDT section which is specified simply as 'data.'  

;...
mov ax, 7C00h ;align stack segment
mov ss, ax
mov eax, 0200h ;align base and stack pointers
mov esp, eax
mov ebp, eax

jmp code:9200h ;jump to residence of Kernel
hlt ;halt processor from executing this binary



In Protected Mode, SS is expecting a GDT entry index, and is usually set as one large flat segment like code and data... making ESP & EBP absolute 32-bit pointers for the entire address space.

The above code should just about hang/triple-fault on a stack operation since there is no 31744th (0x7C00) GDT entry to reference. Also, IIRC, max for GDT is ~8K entries.

Review the code I posted in my previous reply; STACK_BASE would be 0x7C00 in your case.


The frustrating thing is that this version of the code does not want to even boot.  I can't get it to go in Qemu or Bochs, and it won't run on my PC hardware off of floppy.  I am still stupefied at it, and wish more help.  Am I perhaps loading the segment registers incorrectly.


Start putting in dummy code that "hangs" and cycles some video character (0x000B8000) data in a loop. Temporarily add a jump to said dummy code after each significant portion of initialization. If you see the 2 top-left cells cycling through characters/colors, it is working up to that point and you can proceed to the next probable point of failure.

For example:



hang16:
mov ax,0xB800
mov gs,ax
mov bx,0
inc DWORD
jmp hang16


hang32:
inc DWORD[0x000B8000]
jmp hang32



Also as an aside question, Do you recommend setting up a comprehensive segmentation layout for 'General Code/Data' , Kernel Code/Date, Stack, Bootloader/Bootsector, and Descriptor Tables?  Would this just be 'junk' with the implementation of Paging, or would it be even better to do alongside Paging?


I usually go with system (Ring 0) and user (Ring 3) code/data entries for all segment selectors, and a TSS entry for the task register in the case of running Ring 3 code. This effectively disables segmentation and the associated overhead, in which has been mostly deprecated (FS/GS can still be used) in Long Mode anyhow.
Posted on 2010-01-18 16:41:11 by SpooK
I once again thank you for your speedy response, with many gratuitous thanks for setting up this forum as a place to discuss Assembly matters.  I will give your recommendations a try when I have access to my proper computing environment (I am on a poor quality laptop now), which will be either latter tonight or tomorrow.

Until then, I wish to ask:  Why exactly is segmentation completely scrapped in 64-bit long mode.  To me, this goes against the previous strategy of "backwards compatible."  I realize that Segmentation is quite old and almost never used in mainstream today, but I wonder how one would efficiently implement Multi-tasking.  I suppose that the logical response is software based multi-tasking, but I once again ask for an aside form of advice.  What is, in your opinion, the best means to setup software multitasking?

I am most likely getting a new computer during this coming spring (new laptop for university) with 64-bit processor obviously, and am looking forward to taking my current project to the 64-bit platform.  Does this work with the same strategy as 32-bit mode where it is best to enable it sooner than latter.  I know that Paging is a necessity in this case, but is it easy to set up with no Interrupt routines?
Posted on 2010-01-18 20:00:42 by XeonX369

Until then, I wish to ask:  Why exactly is segmentation completely scrapped in 64-bit long mode.  To me, this goes against the previous strategy of "backwards compatible."  I realize that Segmentation is quite old and almost never used in mainstream today, but I wonder how one would efficiently implement Multi-tasking.  I suppose that the logical response is software based multi-tasking, but I once again ask for an aside form of advice.  What is, in your opinion, the best means to setup software multitasking?


Probably because segmentation was a solution for a now obsolete problem. Making segmentation less significant helps decrease processing overhead... there is less need for some heavy "sanity" checks.

Hardware multitasking, dealing with LDTs, etc... have proven to be less efficient than software multitasking on the x86. Software can make smarter decisions on what needs to be preserved during a task switch, in which can make a decent impact on performance. Also, the need for more than 8K processes/threads quickly renders hardware multitasking useless.


I am most likely getting a new computer during this coming spring (new laptop for university) with 64-bit processor obviously, and am looking forward to taking my current project to the 64-bit platform.  Does this work with the same strategy as 32-bit mode where it is best to enable it sooner than latter.  I know that Paging is a necessity in this case, but is it easy to set up with no Interrupt routines?


That depends on the design.

Personally, I utilize Real Mode and the BIOS as much as possible prior to initializing Long Mode. That way, the downtime for interrupt/exception processing is minimized... and you still have good old CTRL+ALT+DEL available.

As for the jump to Long Mode, you can actually jump to it straight from Real Mode, skipping Protected Mode altogether.

A big advantage for developing for the x86_64 architecture is the advanced baseline. With 32-bit Protected Mode, you have the 386, 486, Pentium, etc... all adding a little bit more and more. Is there an FPU, or isn't there? Is there PCI, or ISA only? RDTSC, CPUID, single page invalidation, etc.

With the x86_64, I am virtually guaranteed not to deal with the ISA bus, to have PCI, Local APIC, SYSCALL, MMX/SSE and all the other useful instructions added since the 80386. I can also still run 32-bit applications natively with Compatibility Mode.

Overall, the x86_64 is a really decent design, allowing us to make a slow but safe march into mainstream 64-bit computing while making the process fun (in more than one way) for developers.
Posted on 2010-01-19 01:42:15 by SpooK
Hello again Spook,
I am back at my workstation and have tried your recommendations...to no avail.  I still get the same, what appears to be not booting.  I tried out your debug code, inserting it just before the code to enable protected mode.  The most strange thing happened:  It apparently had run through it.  Absolutely does it not work until after this point, but I found it interesting that this would be the case, when the printing messages functions just do not work.  For a visual 'demonstration,' I have attached a picture (labeled "nonfunct_debugcode.png").

I decided to try and 'reverse' the additions which I had made, and another ambiguity popped up.  With the ORG directive set to the old value of 0000h, everything works until the LGDT instruction, for reasons which you have explained.  I just wonder why the new ORG 7E00h directive appears to be non-functioning, when the debug code runs through fine.  I have also attached screenshots of the results using the separate directives.

Just to double check my stack setup, am I doing things correctly?

.pmode
mov ax, data ;align all data segments with the data descriptor
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax

mov ss, ax ;align stack segment with data descriptor
mov eax, 7C00h ;align base and stack pointers
mov esp, eax
mov ebp, eax


I don't know where I had been told that SS was to take the value of the "highest" byte that the stack would grow down from, and that (e)sp and (e)bp were to be the total 'size' of the stack.  My knowledge of the stack is seriously sketchy now.  The values of the SS (7C00) and SP (0200) registers previously set in 16-bit mode had re-enforced this interpretation of these registers.  Oh well, I find that such ambiguities make this process fun.

I have included my most recent full source code for this, with what little changes I have made.
Posted on 2010-01-19 10:05:47 by XeonX369
I have used your debug instruction more thoroughly, with the exact instruction causing an 'error' being
mov cr0, eax

in the routine to enable Protected Mode.

I still find it quite odd that the debug appears to be reached and yet all of the printing functions are not being reached.

I will continue to search the internet for information and look forward to your reply.
Posted on 2010-01-20 10:59:06 by XeonX369

I have used your debug instruction more thoroughly, with the exact instruction causing an 'error' being
mov cr0, eax

in the routine to enable Protected Mode.


It could also mean that the subsequent jump to Protected Mode is failing, as well.

Put the following code at the very beginning of your loader...


START:
cli
mov bx,START
jmp $


... run it in Bochs and check out the log file. Make sure the value of BX in the log is the same as the expected/desired load address.


I still find it quite odd that the debug appears to be reached and yet all of the printing functions are not being reached.


It could be a problem similar to above. It could also be the fact that the direction flag is not being purposely cleared with CLD before using LODSB.


I will continue to search the internet for information and look forward to your reply.


Bootloader Tutorial @ OSDev.org
Posted on 2010-01-20 16:03:04 by SpooK
Running Bochs with the "check BX" code, the register dump is:
00095536000i | RAX=0000000000000e00  RBX=0000000000007e00
00095536000i | RCX=0000000000090002  RDX=0000000000000000
00095536000i | RSP=0000000000000200  RBP=0000000000000000
00095536000i | RSI=00000000000e7c9a  RDI=000000000000ffac
00095536000i |  R8=0000000000000000  R9=0000000000000000
00095536000i | R10=0000000000000000  R11=0000000000000000
00095536000i | R12=0000000000000000  R13=0000000000000000
00095536000i | R14=0000000000000000  R15=0000000000000000
00095536000i | IOPL=0 id vip vif ac vm rf nt of df if tf sf ZF af PF cf
00095536000i | SEG selector    base    limit G D
00095536000i | SEG sltr(index|ti|rpl)    base    limit G D
00095536000i |  CS:7e00( 0004| 0|  0) 0007e000 0000ffff 0 0
00095536000i |  DS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00095536000i |  SS:7c00( 0005| 0|  0) 0007c000 0000ffff 0 0
00095536000i |  ES:7e00( 0005| 0|  0) 0007e000 0000ffff 0 0
00095536000i |  FS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00095536000i |  GS:0000( 0005| 0|  0) 00000000 0000ffff 0 0
00095536000i |  MSR_FS_BASE:0000000000000000
00095536000i |  MSR_GS_BASE:0000000000000000
00095536000i | RIP=0000000000000004 (0000000000000004)
00095536000i | CR0=0x60000010 CR2=0x0000000000000000
00095536000i | CR3=0x00000000 CR4=0x00000000


What I found of interest here, is that bochs says that the base of the DS register is 0000 and CS is 0007E000.  Am I not setting my registers as I think I am, or am I just mis-reading?

The BX register indeed does contain the value intended.  Does the R prefix of the registers mean that I am running Bochs in 64-bit emulation mode?  I just started the configuration of Bochs today, so it was a bit rushed.

Also, I feel that in order to give a better view of my current environment state,  I should show you my bootsector, and current bootloader.  They are as an attachment file.

In light of my switch to a flat segment (to facilitate long mode in the future), should I change the segment registers at the beginning of the bootloader code into a flat 0000 like the ones in the bootsector?
Posted on 2010-01-21 14:02:03 by XeonX369

What I found of interest here, is that bochs says that the base of the DS register is 0000 and CS is 0007E000.  Am I not setting my registers as I think I am, or am I just mis-reading?


jmp 7E00h:boot in your loader code answers the above. This is a 16-bit far jump, the first word sets CS and the second word sets IP.

Note that in x86 segmentation, segments are calculated by the value of the segment register * 16. 0x7E00:0x0000 = 0x0007E000 and not 0x00007E00.

Change it to jmp 0000:7E00h and you now have one less problem :)

This is also duplicated in the boot code at the end as jmp 7E00h:0000h. This will need fixing as well.

Finally, you are repeating this mistake in your bootsector readdsk function. Notice that DS is set to zero in your Bochs log, this is why no data references are working as you would expect... your data is over 400KB away from where your code thinks it is.
Posted on 2010-01-21 18:04:40 by SpooK
Thanks Spook for the information on the Real Mode addressing issues I have been having.  I have checked up on this for of addressing and have found that the sector should be multiplied by 16 and the offset added.  I found this to be quite easy (in hex).

My code now runs full through until the jump to the Kernel space.  I have decided to load the kernel into 100000h address, as a result of a recommendation from others.

The code is:
jmp code:100000h ;jump to residence of Kernel


I hope that this is correct, and falls in line with the rest of the code that I corrected.  I have the 'code' segment as the base (which is 0, correct?) * 16  and then add the offset (which is the address to the loading point of the kernel).

I really do hope that I am doing this correctly.  Just to check and see, the kernel loading code is as so.
.rddsk
mov ax, 0xFFFF ;load FFFFh into AX register (can't directly manipulate ES)
mov es, ax ;set ES to contents of AX (location of memory to read to)
mov bx, 0010h ;set offset to 0010h

mov ah, 02h ;place 02h (read function) into AH
mov al, 12h ;read 18 sectors
mov ch, 00h ;read from cylinder 1
mov cl, 04h ;read from sector 4
mov dh, 00h ;read from head 0
mov dl, ;read from drive

int 13h ;call int 13h to read from disk


I appreciate your continued help in this slight problem.  So close ... yet so far...
Posted on 2010-01-25 10:48:45 by XeonX369
Think about 0xFFFF:0x0010, where that is, and what it means. Hint.

Analyze the maximum amount of addressable address space while using 16-bit (es:bx) addressing.
Posted on 2010-01-25 11:01:03 by SpooK
With the FFFFh:0010h:
          FFFFh * 10h = FFFF0h
          FFFF0h + 10h = 100000h

This means that it will be above the addressable limits of 16 bit address mode:
          2h ^ 10h = 10000h

But doesn't the A20 line open up a bunch of new memory?  Why do many tutorials that I see enable A20, and then load the  kernel before 32-bit protected mode is enabled?

Or is it that A20 provides:
          2h ^ 14h = 100000h

If my above supposition is true, then how will I actually load the kernel if in protected mode, you cannot use interrupts?  Should I switch to my old 9200h setup (without the errors)?  Also, what stack size is recommended for a 32-bit kernel?

Thanks.
Posted on 2010-01-25 11:17:42 by XeonX369

With the FFFFh:0010h:
          FFFFh * 10h = FFFF0h
          FFFF0h + 10h = 100000h

This means that it will be above the addressable limits of 16 bit address mode:
          2h ^ 10h = 10000h


IIRC, 16-bit segmentation wraps on 64KB boundaries. This means that 0xFFFF:0x0010 = 0x0000F0000 and not 0x00100000.

If you are looking for a fairly reliable way to load data beyond the 1MB mark while in 16-bit Real Mode, look into Unreal Mode.
Posted on 2010-01-25 13:57:41 by SpooK
And if I decided to load the kernel to 9200h, what then would I do?
I have tried both, with the obvious 100000h 'problem' ("unreal" mode seems to be not elegant (OCD, anyone)), and 9200h does the same triple fault.

Perhaps I ought to look at the kernel for a code error.  I know it should be printing an information string.

EDIT:  The kernel is not mis-behaving and indeed should print a string.  I have clarified in the past that it works with GRUB, but I just hate using other people's software.

EDIT:  The problem indeed lies with the jump instruction.  It will not accept any values that I give it, even if the kernel is loaded into a section of memory under 64k.
Posted on 2010-01-25 15:20:46 by XeonX369
I believe that I have found a cause (or at least a partial one).  My kernel file is over 36864 bytes in size, when it is only composed of a simple string printing routine, and a clear screen routine, as well as a 'pre-kernel' which is the actual code jumped to.

Here is where things get weird for me:  assembling the 'pre-kernel' with the aout format will yield a file size which is expected.  Linking then gives an oddly large file size, as stated above.  Assembling the 'pre-kernel' in elf format will give about the same expected size; and yet when linked, the file is an expected about 3KB.  I have my linker script setup to specify the correct entry point, and the place to be loaded to, as well as the sections of text and bss.

Any thoughts on this?
Posted on 2010-01-28 11:39:56 by XeonX369