I am working on a embedded board with MIPS architecture and i have a requirement to initialize the 128MB RAM by writing zero's onto the memory. I have coded the following thing
But with this code i am getting a performance issue where the above code takes about 70 to 80 seconds to write zero's onto the memory location. Is there any better way to do the same operation so that the time taken can be reduced to less than 5 seconds.
-BalaC-
// RAM initialization part
li t2, 0x80000000 //RAM starting address
li t3, 0x02000000 //(RAM size/0x04) ( 0x08000000(128MB)/0x04 )
li t4, 0x00000000 //Value we need to initialize
ram_init:
sw t4, 0(t2)
addi t2, t2, 0x00000004
addi t3, t3, -0x00000004
bgtz t3, ram_init
nop
But with this code i am getting a performance issue where the above code takes about 70 to 80 seconds to write zero's onto the memory location. Is there any better way to do the same operation so that the time taken can be reduced to less than 5 seconds.
-BalaC-
I think you may be limited by the branching there.
You could try unrolling the loop, so you fill a bunch of memory positions at a time.
Now I don't know MIPS asm, so this is just my guess at the code:
But something like that anyway, this unrolls it 4 times. If my theory is correct, this will hide the latency of the branch a bit, and make more effective use of the store-unit.
I guess the branch itself might be more efficient aswlel, if you don't have the add and the branch right next to eachother, but have the add a bit in advance. But I'd need to know more about the actual architecture for that.
You could try unrolling the loop, so you fill a bunch of memory positions at a time.
Now I don't know MIPS asm, so this is just my guess at the code:
li t2, 0x80000000 //RAM starting address
li t3, 0x02000000 //(RAM size/0x04) ( 0x08000000(128MB)/0x04 )
li t4, 0x00000000 //Value we need to initialize
ram_init:
sw t4, 0(t2)
sw t4, 4(t2)
sw t4, 8(t2)
sw t4, 12(t2)
addi t2, t2, 0x00000010
addi t3, t3, -0x00000010
bgtz t3, ram_init
nop
But something like that anyway, this unrolls it 4 times. If my theory is correct, this will hide the latency of the branch a bit, and make more effective use of the store-unit.
I guess the branch itself might be more efficient aswlel, if you don't have the add and the branch right next to eachother, but have the add a bit in advance. But I'd need to know more about the actual architecture for that.
Scali thanks for your reply 8) i will try that one and get back.
Scali that solution is working with about 40 to 45 seconds delay. Is there any other instruction set available to perform bulk transfer?
You may also want to check in what condition that RAM is in on power up. If it happens to be zeroed when powered off, it could be an option to rezero it whenever required.
Usually, RAM is expected to loose all data when powered off.
Usually, RAM is expected to loose all data when powered off.
Raymond thanks for your reply, As you have quoted
Here in my case i am not switching the board off instead what i will do is i just issue a reboot so the contents in RAM is not getting cleared. I need to perform the step of clearing RAM every time my board is power rebooted.
-BalaC-
Usually, RAM is expected to loose all data when powered off.
Here in my case i am not switching the board off instead what i will do is i just issue a reboot so the contents in RAM is not getting cleared. I need to perform the step of clearing RAM every time my board is power rebooted.
-BalaC-
Hello guys,
I have found out the corresponding solution in x86 architecture. Is there anyone who can help me in converting the following code for MIPS architecture
Expecting reply from you guys.......
Thank in advance.
-BalaC-
I have found out the corresponding solution in x86 architecture. Is there anyone who can help me in converting the following code for MIPS architecture
xor eax, eax // Value to write into memory
mov es, ax
mov ecx, 2000000h // Number of times to loop [2000000h * 4]
mov edi, 80000000h // Starting address of memory to start writing
rep stosd es: // Instruction to perform memory write
Expecting reply from you guys.......
Thank in advance.
-BalaC-
Nope.
MIPS is a RISC CPU. RISC stands for Reduced Instruction Set Computing.
Which means that they 'reduced' the instructionset by removing all sorts of superfluous instructions from it.
rep stosd is exactly the type of instructions that you remove in a RISC CPU.
MIPS is a RISC CPU. RISC stands for Reduced Instruction Set Computing.
Which means that they 'reduced' the instructionset by removing all sorts of superfluous instructions from it.
rep stosd is exactly the type of instructions that you remove in a RISC CPU.
Scali again thanks for you input.
I think i need to search for some other alternative(like DMA) :sad: to achieve this one.
-BalaC-
I think i need to search for some other alternative(like DMA) :sad: to achieve this one.
-BalaC-
stosb == mov ,al
add/sub edi,1 ;adds or subtracts based on direction flag
add/sub edi,1 ;adds or subtracts based on direction flag