Windows XP Master Boot Record

Here is a hex dump of the MBR of drive C of a Windows XP computer, located in the first 512 bytes of the hard drive:

       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000  33 C0 8E D0 BC 00 7C FB 50 07 50 1F FC BE 1B 7C  3ÀŽÐ¼.|ûP.P.ü¾.|
0010  BF 1B 06 50 57 B9 E5 01 F3 A4 CB BD BE 07 B1 04  ¿..PW¹å.ó¤Ë½¾.±.
0020  38 6E 00 7C 09 75 13 83 C5 10 E2 F4 CD 18 8B F5  8n.|.u.ƒÅ.âôÍ.‹õ
0030  83 C6 10 49 74 19 38 2C 74 F6 A0 B5 07 B4 07 8B  ƒÆ.It.8,tö µ.´.‹
0040  F0 AC 3C 00 74 FC BB 07 00 B4 0E CD 10 EB F2 88  ð¬<.tü»..´.Í.ëòˆ
0050  4E 10 E8 46 00 73 2A FE 46 10 80 7E 04 0B 74 0B  N.èF.s*þF.€~..t.
0060  80 7E 04 0C 74 05 A0 B6 07 75 D2 80 46 02 06 83  €~..t. ¶.uÒ€F..ƒ
0070  46 08 06 83 56 0A 00 E8 21 00 73 05 A0 B6 07 EB  F..ƒV..è!.s. ¶.ë
0080  BC 81 3E FE 7D 55 AA 74 0B 80 7E 10 00 74 C8 A0  ¼.>þ}Uªt.€~..tÈ 
0090  B7 07 EB A9 8B FC 1E 57 8B F5 CB BF 05 00 8A 56  ·.ë©‹ü.W‹õË¿..ŠV
00A0  00 B4 08 CD 13 72 23 8A C1 24 3F 98 8A DE 8A FC  .´.Í.r#ŠÁ$?˜ŠÞŠü
00B0  43 F7 E3 8B D1 86 D6 B1 06 D2 EE 42 F7 E2 39 56  C÷ã‹Ñ†Ö±.ÒîB÷â9V
00C0  0A 77 23 72 05 39 46 08 73 1C B8 01 02 BB 00 7C  .w#r.9F.s.¸..».|
00D0  8B 4E 02 8B 56 00 CD 13 73 51 4F 74 4E 32 E4 8A  ‹N.‹V.Í.sQOtN2äŠ
00E0  56 00 CD 13 EB E4 8A 56 00 60 BB AA 55 B4 41 CD  V.Í.ëäŠV.`»ªU´AÍ
00F0  13 72 36 81 FB 55 AA 75 30 F6 C1 01 74 2B 61 60  .r6.ûUªu0öÁ.t+a`
0100  6A 00 6A 00 FF 76 0A FF 76 08 6A 00 68 00 7C 6A  j.j.ÿv.ÿv.j.h.|j
0110  01 6A 10 B4 42 8B F4 CD 13 61 61 73 0E 4F 74 0B  .j.´B‹ôÍ.aas.Ot.
0120  32 E4 8A 56 00 CD 13 EB D6 61 F9 C3 49 6E 76 61  2äŠV.Í.ëÖaùÃInva
0130  6C 69 64 20 70 61 72 74 69 74 69 6F 6E 20 74 61  lid partition ta
0140  62 6C 65 00 45 72 72 6F 72 20 6C 6F 61 64 69 6E  ble.Error loadin
0150  67 20 6F 70 65 72 61 74 69 6E 67 20 73 79 73 74  g operating syst
0160  65 6D 00 4D 69 73 73 69 6E 67 20 6F 70 65 72 61  em.Missing opera
0170  74 69 6E 67 20 73 79 73 74 65 6D 00 00 00 00 00  ting system.....
0180  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0190  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
01A0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
01B0  00 00 00 00 00 2C 44 63 03 41 05 55 00 00 00 01  .....,Dc.A.U....
01C0  01 00 12 EF BF 62 3F 00 00 00 F1 F6 8C 00 80 00  ...ï¿b?...ñöŒ.€.
01D0  81 63 07 EF FF FF 30 F7 8C 00 D0 5B 1B 04 00 00  .c.ïÿÿ0÷Œ.Ð[....
01E0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
01F0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA  ..............Uª

(Hex dump generated by HxD)

Analysis

BIOS loads this sector to 0000:7C00 and jumps at the start of the code, so the computer starts executing it. Because the computer is still running in real mode, the code is in the original 8088 machine language, relying only on 16-bit registers and segments. Well, actually, it uses some opcodes (such as pusha and popa) that were introduced with its immediate successor, the 80186, but it is still working in 16-bit real mode.

The code sets up the stack segment at 0000 and the stack pointer at 7C00, so the stack starts right below the code itself (remember, stack grows downwards, so any pushes or hardware interrupts will not overwrite the code). Most of the MBRs I have seen disable hardware interrupts using the CLI op-code but, for some reason, this one does not. That is rather strange, as you really do not want a hardware interrupt to happen after you have changed the stack segment and before you have changed the stack pointer. But it enables hardware interrupts as soon as it moves the stack. So, Microsoft seems to expect the BIOS to disable them. Or, maybe it’s just a bug.

Note: I use NASM syntax in my disassembly, as NASM is my assembler of choice. That means that when a dotted label is interpreted as the last undotted label plus the dotted label. So, in this code .main is the same as mbr.main because the only undotted label before it is mbr right at the start of the code, while .lba_check is the same as read_drive.lba_check because the last undotted label before it is read_drive. Additionally, I am showing you the machine language and its actual memory location, which would not be seen in the original assembly language code.

mbr:
0000:7C00  33C0              xor ax, ax        ; ax = 0
0000:7C02  8ED0              mov ss, ax        ; stack segment is at 0000
0000:7C04  BC007C            mov sp, 0x7c00    ; stack pointer at 7C00
0000:7C07  FB                sti               ; enable hardware interrupts

Next, it sets DS and ES (data segment, extra segment) to 0000. AX is still 0. The original 16-bit 8088 could not just use mov ds,ax and mov es,ax. So, the code pushes AX on the stack and then pops ES. It does the same with DS.

0000:7C08  50                push ax
0000:7C09  07                pop es            ; extra segment is at 0000
0000:7C0A  50                push ax
0000:7C0B  1F                pop ds            ; data segment is at 0000

At just 512 bytes for both code and data, the MBR is a very small piece of code. Its purpose is to find the actual boot code on the drive, replace itself with it, and run it. But if it just loaded the boot code to its own location, the boot code would overwrite it, and the MBR would never get a chance to execute the boot code.

So, the MBR copies its own main code to a lower memory area, starting at 0600. It then needs to jump to the newly copied code. This particular MBR does that by pushing AX (0) on the stack (the code segment (CS) of the copied code) and then pushing DI (the desired code pointer (CP) of the copied code) and then requesting a far return (RETF). When the microprocessor sees the RETF request, it pops the top of the stack to CP, then the next top of the stack to CS, and continues at the new CS:CP.

That way, the same code will continue running but from the new memory location, allowing the MBR to (hopefully) find the boot code, load it at its old location, and execute it.

0000:7C0C  FC                cld
0000:7C0D  BE1B7C            mov si, 0x7c00 + .main - mbr
0000:7C10  BF1B06            mov di, 0x0600 + .main - mbr
0000:7C13  50                push ax           ; CS to return to
0000:7C14  57                push di           ; CP to return to
0000:7C15  B9E501            mov cx, mainlen
0000:7C18  F3A4              rep movsb         ; copy CX bytes from DS:SI
                                               ;     to ES:DI
0000:7C1A  CB                retf              ; "return" to the copy 
                                               ;     of mbr.main

The MBR contains a partition table, located at 01BE (see the hex dump above), labeled partition_table in our disassembly, located physically at 0000:07BE (since 0600 + 01BE = 07BE). This table contains four records, each 16 bytes long. While other operating systems allow more than one bootable partition (so you can boot different operating systems at different times), a Microsoft MBR, like this one, expects that one and only one partition on the first dive (C) is bootable. Otherwise, it assumes there is an error (that is, if there are more than one bootable partitions, or if there is not any).

For each of the first records, if its first byte is equal to 0, the corresponding partition is not bootable (or may not even exist, since not every drive has four partitions); if it is less than zero, it is bootable. Otherwise, there is an error. If you take a look at the hex dump, you will see that my first partition is not bootable, but the second one is (see Note 1 why).

The code loads the address of the partition table into BP and checks, up to four times, if the first byte is negative. If it is, it accepts the partition as bootable and jumps to mbr.bootable. If not, it checks whether the byte value is a zero, and if it is not, it rejects the entire system as unbootable and jumps to mbr.unbootable. Otherwise it adds 16 (hex 10) to BP and continues checking until all four records have been examined. If none of them checks out as bootable, the MBR issues interrupt 18 (hex). This interrupt never returns, so the work of MBR is finished.

By the way, back when IBM came up with its first PC (which was in 1981), it placed a simple version of the BASIC language into its ROM. Int 18h started this BASIC interpreter. So, if the computer had no operating system to boot, it ran BASIC. None of the PC compatible computers had BASIC in their ROM, and none of the modern computers does. But, to this day, the MBR loads int 18h if it cannot find something to boot from the drive. Typically, int 18h will just prompt you to press any key to reboot the computer. ♫

.main:
0000:061B  BDBE07            mov bp, 0x0600 + partition_table - mbr
0000:061E  B104              mov cl, 4

.find_bootable_drive:
0000:0620  386E00            cmp [bp], ch       ; CH = 0 from rep movsb
0000:0623  7C09              jl .bootable
0000:0625  7513              jne .unbootable
0000:0627  83C510            add bp, byte +0x10 ; See Note 2
0000:062A  E2F4              loop .find_bootable_drive
0000:062C  CD18              int 0x18

Once it finds a bootable partition, you might think it will now load the boot code from it. But no, it still wants to check that this is the only bootable partition on the drive, and smite thee if it is not!

When it was looking for a bootable drive, the loop op code decreased the value of CL every time it did not find one. Because of that, the MBR knows how many more partition records it wants to examine.

It also added 16 (hex 10) to BP every time it did not find a bootable drive, but not when it found it. So, BP contains the pointer to the active partition record. The MBR needs to keep it in BP, so now it is going to use SI as a pointer to the rest of them. The first thing it does is copy the value in BP to SI. Then it goes through a small loop (though it does not use the loop op code here), each time decreasing the value of CX.

Once CX has reached 0, the MBR knows everything checks out, so it jumps to mbr.load_boot_code. But before CX reaches 0, it compares the first byte of the next partition record (whose address is in SI) to 0. If it is 0, it checks the next record if any. But if it is not 0, it continues at mbr.unbootable, where it proclaims that the partition table is invalid, and then it effectively locks the computer by jumping into an endless loop. But since the hardware interrupts are enabled, you should still be able to press ctl-alt-del and reboot the computer (after you have inserted a diskette in the diskette drive or a CD in the CD drive since, of course, you cannot boot from the hard drive). The idea that you should only have one bootable partition is quite presumptuous and arrogant, as it implies that you should never install anything other than Windows on your system.

.bootable:
0000:062E  8BF5              mov si, bp

.shall_I_smite_thee:
0000:0630  83C610            add si, byte +0x10
0000:0633  49                dec cx
0000:0634  7419              jz .load_boot_code
0000:0636  382C              cmp [si], ch      ; CH = 0
0000:0638  74F6              je .shall_I_smite_thee

The MBR code reaches this point if either the drive has more than one bootable partition, or it has one with a number greater than 0 in the first byte of its partition record. Either way, the MBR will print “Invalid partition table,” or its equivalent in another language, and then just freeze the system.

The code starts by getting the memory address of the first character of that message into SI. It goes about it in a roundabout way. It first looks up the lower 8 bits of the 16-bit address (remember, we are working in the real mode here, as if all we had was the 8088 microprocessor) in a look-up table located several bytes before the first partition record. It loads that value into AL, then places 7 into AH. So, if this text string is located at 0000:072C, which is 012C in the hex dump plus the 0000:0600, where the code and data were copied, the value found at invalid_partition_ptr will be a single byte of 2C hex, which, combined with the 7 in AH, will produce 072C in AX, the correct address of the first byte of the string.

I should add that doing it this indirect way is rather strange here. I have read that Microsoft produces the MBR in a number of languages, so each text string starts somewhere else in each version. But that explanation makes no sense. The assembler has to create a new look-up table for each such version, while it could just as easily hardcode the address of each string for each such version.

Anyway, with the address in AX, the code copies it to SI, and then starts a loop of fetching each character of the string into AL and showing it to the screen using the BIOS interrupt 10h. But before displaying the byte as a character, it compares it to zero (which is the first byte after the string). If it is a zero, it compares it to 0 again and again and again, until you either turn off the computer or reboot. This is how it freezes the computer after printing the message.

The same code is also used to print other messages and freeze. When the code needs to do that, it places the address of the message to SI and just jumps to mbr.print_and_freeze in this section of the code, as we will see later.

.unbootable:
0000:063A  A0B507            mov al, [0x0600 + invalid_partition_ptr]

.print_and_freeze:
0000:063D  B407              mov ah, 0x07
0000:063F  8BF0              mov si, ax

.print_and_freeze_loop:
0000:0641  AC                lodsb

.freeze:
0000:0642  3C00              cmp al, 0
0000:0644  74FC              jz .freeze
0000:0646  BB0700            mov bx, 0x0007
0000:0649  B40E              mov ah, 0x0e
0000:064B  CD10              int 0x10
0000:064D  EBF2              jmp short .print_and_freeze_loop

If we have not been smitten, the MBR comes here and tries to boot the operating system. It first tries to read the boot loader by calling the procedure I have labeled read_drive.

If that does not work the first time, it checks if the partition uses the Windows 95 FAT32 format, and if so, it adds six to the number of heads and to the logical block address in its partition record (only in its copy in the computer memory, it does not write the changed value back to the MBR on the drive) and tries calling read_drive again. I have no idea why it adds 6, but that is what it does.

To insure that it only does this trick once, it uses a flag by setting the first byte after the active partition record to zero as the very first thing of mbr.load_boot_code (it does so by setting it equal to CL, which became 0 in the mbr.shall_I_smite_thee check above. Then, right before the addition of 6, it increases the value of this flag in mbr.fat_check.

If it successfully reads the boot code from the hard drive, it checks for the magic number of AA55 at the end of the loaded boot code, and if it is there, it transfers control to the boot code. The operating system is loaded and the job of the MBR is finished.

In case it does not succeed, it prints an error message and freezes the computer. If the error was in not finding the magic number, it says the operating system is missing. If the error was in not being able to read from the disk even after five tries, it just mentions an “error loading operating system.” As this particular error indicates a problem with reading from a hard drive, I suggest immediately using Spinrite set at level 5. Chances are it will fix the problem. If it can’t, nothing will.

.load_boot_code:
0000:064F  884E10            mov [bp+0x10], cl   ; CL = 0
0000:0652  E84600            call read_drive
0000:0655  732A              jnc .magic_check

.fat_check:
0000:0657  FE4610            inc byte [bp+0x10]          ; set flag
0000:065A  807E040B          cmp byte [bp+0x04], 0x0b    ; FAT32?
0000:065E  740B              je .cholesterol

0000:0660  807E040C          cmp byte [bp+0x04], 0x0c    ; FAT32 with LBA?
0000:0664  7405              je .cholesterol
0000:0666  A0B607            mov al, [0x0600 + error_loading_ptr]
0000:0669  75D2              jne .print_and_freeze

.cholesterol:                ; FAT32 partition
0000:066B  80460206          add byte [bp+0x02], 6
0000:066F  83460806          add word [bp+0x08], byte 6  ; See Note 2
0000:0673  83560A00          adc word [bp+0x0a], byte 0  ; See Note 2
0000:0677  E82100            call read_drive             ; One last try
0000:067A  7305              jnc .magic_check
0000:067C  A0B607            mov al, [0x0600 + error_loading_ptr]
0000:067F  EBBC              jmp short .print_and_freeze

.magic_check:
0000:0681  813EFE7D55AA      cmp word [0x7dfe], 0xaa55   ; the magic number
0000:0687  740B              je .boot_already
0000:0689  807E1000          cmp byte [bp+0x10], 0       ; tried FAT32 yet?
0000:068D  74C8              je .fat_check
0000:068F  A0B707            mov al, [0x0600 + missing_system_ptr]
0000:0692  EBA9              jmp short .print_and_freeze

.boot_already:
0000:0694  8BFC              mov di, sp          ; SP = 7C00
0000:0696  1E                push ds             ; destination CS (0)
0000:0697  57                push di             ; destination CP (7C00)
0000:0698  8BF5              mov si, bp          ; No idea why . . . .
0000:069A  CB                retf                ; Go boot!!!

The final part of the MBR code is a procedure to read the boot code from the drive to the memory at 0000:7C00. It is a procedure, rather than inline code, because it is used twice by the mbr.load_boot_code routine, from which it has to be called.

It makes up to five attempts to read the boot sector. It returns as soon as it succeeds, in which case it clears the carry flag of the microprocessor. If it fails five times, it gives up and returns with the carry flag set. It uses the DI register to keep track of how many times it tries.

It starts by getting the drive parameters using BIOS interrupt 13h. If it gets those parameters successfully, it calculates the size of the drive. If the size indicates this is a newer larger drive that cannot be handled by the traditional use of int 13h, it reads it using the newer LBA standard. If it finds an older and smaller drive, or if it cannot get the parameters, it uses the older int 13h reading method.

read_drive:
0000:069B  BF0500            mov di, 5           ; Try up to 5 times
0000:069E  8A5600            mov dl, [bp]        ; Drive number
0000:06A1  B408              mov ah, 8           ; Get drive parameters
0000:06A3  CD13              int 0x13
0000:06A5  7223              jc .bios_read
                             ; Calculate the size of the drive
0000:06A7  8AC1              mov al, cl
0000:06A9  243F              and al, 0x3f
0000:06AB  98                cbw
0000:06AC  8ADE              mov bl, dh
0000:06AE  8AFC              mov bh, ah
0000:06B0  43                inc bx
0000:06B1  F7E3              mul bx
0000:06B3  8BD1              mov dx, cx
0000:06B5  86D6              xchg dl, dh
0000:06B7  B106              mov cl, 0x06
0000:06B9  D2EE              shr dh, cl
0000:06BB  42                inc dx
0000:06BC  F7E2              mul dx

                             ; If the drive is small, use the traditional
                             ; BIOS routines, otherwise the LBA routines
0000:06BE  39560A            cmp [bp+0x0a], dx
0000:06C1  7723              ja .lba_check
0000:06C3  7205              jb .bios_read
0000:06C5  394608            cmp [bp+0x08], ax
0000:06C8  731C              jae .lba_check

                             ; This is the traditional BIOS read for
                             ; older and smaller drives
.bios_read:
0000:06CA  B80102            mov ax, 0x0201      ; Read 1 sector
0000:06CD  BB007C            mov bx, 0x7c00      ; to ES:7C00
0000:06D0  8B4E02            mov cx, [bp+0x02]   ; cylinder/sector
0000:06D3  8B5600            mov dx, [bp]        ; Drive number/head
0000:06D6  CD13              int 0x13
0000:06D8  7351              jnc .done           ; Success!
0000:06DA  4F                dec di              ; Did we try 5 times yet?
0000:06DB  744E              jz .done            ; ... Yes, give up!
0000:06DD  32E4              xor ah, ah          ; ... No, reset drive ...
0000:06DF  8A5600            mov dl, [bp]        ;     and try again!
0000:06E2  CD13              int 0x13
0000:06E4  EBE4              jmp short .bios_read

.lba_check:
0000:06E6  8A5600            mov dl, [bp]        ; Drive number
0000:06E9  60                pusha
0000:06EA  BBAA55            mov bx, 0x55aa
0000:06ED  B441              mov ah, 0x41        ; LBA install check
0000:06EF  CD13              int 0x13
0000:06F1  7236              jc .give_up
0000:06F3  81FB55AA          cmp bx, 0xaa55      ; installed?
0000:06F7  7530              jne .give_up
0000:06F9  F6C101            test cl, 0x01       ; got extended access?
0000:06FC  742B              jz .give_up
0000:06FE  61                popa

                             ; This reads the new, LBA-mapped, drives
.lba_read:
0000:06FF  60                pusha
                             ; If "push byte" confuses you,
                             ; see Note 2
0000:0700  6A00              push byte 0         ; where to read from
0000:0702  6A00              push byte 0         ;       - " -
0000:0704  FF760A            push word [bp+0x0a] ;       - " -
0000:0707  FF7608            push word [bp+0x08] ;       - " -
0000:070A  6A00              push byte 0         ; where to read to (high)
0000:070C  68007C            push word 0x7c00    ; where to read to (low)
0000:070F  6A01              push byte 1         ; num blocks to read
0000:0711  6A10              push byte 0x10      ; size of packet
0000:0713  B442              mov ah, 0x42        ; LBA read
0000:0715  8BF4              mov si, sp
0000:0717  CD13              int 0x13
0000:0719  61                popa
0000:071A  61                popa
0000:071B  730E              jnc .done           ; Success!
0000:071D  4F                dec di              ; Tried 5 times yet?
0000:071E  740B              jz .done            ; ... Yes, give up!
0000:0720  32E4              xor ah, ah          ; ... No, reset drive ...
0000:0722  8A5600            mov dl, [bp]        ;     and try again
0000:0725  CD13              int 0x13
0000:0727  EBD6              jmp short .lba_read

.give_up:
0000:0729  61                popa
0000:072A  F9                stc                 ; error return :(

.done:
0000:072B  C3                ret

Code ends here, all 300 bytes of it. The rest of the MBR is data.

invalid_partition:
0000:072C  49 ...            db 'Invalid partition table', 0
error_loading:
0000:0744  45 ...            db 'Error loading operating system', 0
missing_system:
0000:0763  4D ...            db 'Missing operating system', 0

       times 0x01b5 - ($-$$) db 0                ; skip to 01B5 (+ 0600)

invalid_partition_ptr:
0000:07B5  2C                db invalid_partition - $$
error_loading_ptr:
0000:07B6  44                db error_loading - $$
missing_system_ptr:
0000:07B7  63                db missing_system - $$

0000:07B8  03410555          dd 0x55054103       ; serial number?
0000:07BC  0000              dw 0

partition_table:
                            ; first partition
0000:07BE                    db 0                ; non-bootable partition
0000:07BF  010100            db 1, 1, 0          ; cylinder/head/sector
0000:07C2  12                db 0x12             ; configuration partition
0000:07C3  EFBF62            db 0xef, 0xbf, 0x62 ; C/H/S of last sector
0000:07C6  3F000000          dd 0x0000003f       ; logical block address
0000:07CA  F1F68C00          dd 0x008cf6f1       ; size in sectors

                            ; second partition
0000:07CE  80                db 0x80             ; bootable partition
0000:07CF  008163            db 0, 0x81, 0x63    ; cylinder/head/sector
0000:07D2  07                db 0x07             ; Windows NTFS
0000:07D3  EFFFFF            db 0xef, 0xff, 0xff ; C/H/S of last sector
0000:07D6  30F78C00          dd 0x008cf730       ; logical block address
0000:07DA  D05B1B04          dd 0x041b5bd0       ; size in sectors

0000:07DE  00 ...            times 16 db 0       ; no third partition
0000:07EE  00 ...            times 16 db 0       ; no fourth partition

magic:
0000:07FE  55AA              dw 0xaa55           ; MBR magic number
mainlen equ $-mbr.main

Notes

Note 1: My boot drive originally came built into a flakey HP Pavillion XT993 computer, which came with Windows XP preinstalled and with no Windows setup disk. Instead, HP created a small hidden partition at the start of the drive, where the setup software has been hiding to this day. That is why it is the second partition of the drive that is bootable.

Note 2: If it seems that some parts of the code are trying to assign a byte value to a word register or memory area, that is an illusion. The 80xxx processors come with two different machine codes for many of their operations. This includes smaller (in terms of bytes used for the machine code) codes to assign a small constant, that fits in a byte, to a word-sized register or memory (or even stack). Microsoft’s MASM hides this from the programmer and tries to guess which machine code to use. NASM, on the other hand, wants the programmer to have complete control over the way the assembly language code is assembled into machine code. It uses the byte keyword to be told to use the smaller machine code where available.

Thus, both push 0 and push byte 0 will place a word-sized zero on the stack (in the 16-bit mode used by the MBR) but the latter will assemble to smaller machine language.

Note 3: For details on int 13h, see part B of Ralf Brown’s Interrupt List

Note 4: In NASM, the $$ token is used to convert a segment from a pointer to an integer.

For example, db invalid_partition would produce an error because it attempts to assign the 16-bit invalid_partition pointer to a byte.

But db invalid_partition - $$ works fine because invalid_partition - $$ is an integer. The integer is too big to fit to a byte, so NASM just assigns its low eight bits to the byte, which is what we want.

The Bugs

Egyptian dung beetle

There is just one bug in this MBR that I can see. Though it explains why, on rare occasion, my computer just hangs instead of booting up, and the only recourse I have is turning it off with the switch on the back of the tower and turning it back on, after which it boots up without a hitch! And millions of people probably have the same occasional problem.

The Interrupt Bug

The bug is right at the beginning of the code. There should be a CLI op code to disable hardware interrupts either before or after the very first line of the code. Unfortunately, this Master Boot Record is completely CLI-less!

The computer is filled with hardware, which is just about initializing itself at the time the MBR starts executing. If a device issues a hardware interrupt after the stack segment was changed but before the new stack pointer was assigned, it will store a return address at some unpredictable location, so it can overwrite some code, and the computer will end up in limbo. The only way to recover will be powering it down (or pressing the hardware reset button, if you have one) and starting over.

For the bug to bite, the interrupt has to happen at precisely the right nanosecond, or so, which makes it quite rare, and is perhaps why Microsoft did not catch this bug during their test cycle. But it does happen occasionally.

A Corrected Version

To correct the code, all we need to do is place a CLI either before or after the first line, as I have mentioned already. Placing it after the first makes, very strictly speaking, for more efficient code on modern Intel and AMD processors. I am not going to get into the details of that because this is not about optimizing assembly language code but about fixing a bug in the MBR.

The machine code for the CLI is just one byte long, which adds up to the bonus that mbr.main now starts at an even address. Because of that, we can replace the rep movsb with rep movsw for another optimization.

With that said, here is what the corrected MBR for this particular drive (since it contains its partition table and serial number) looks like (again, in NASM notation):

mbr:
	xor ax, ax
	cli
	mov ss, ax
	mov sp, 0x7c00
	sti

	push ax
	pop es
	push ax
	pop ds

	cld
	mov si, 0x7c00 + .main - mbr
	mov di, 0x0600 + .main - mbr
	push ax
	push di
	mov cx, mainlen / 2
	rep movsw
	retf

.main:
	mov bp, 0x0600 + partition_table - mbr
	mov cl, 4

.find_bootable_drive:
	cmp [bp], ch
	jl .bootable
	jne .unbootable

	add bp, byte +0x10
	loop .find_bootable_drive
	int 0x18

.bootable:
	mov si, bp

.shall_I_smite_thee:
	add si, byte +0x10
	dec cx
	jz .load_boot_code
	cmp [si], ch
	je .shall_I_smite_thee

.unbootable:
	mov al, [0x0600 + invalid_partition_ptr]

.print_and_freeze:
	mov ah, 0x07
	mov si, ax

.print_and_freeze_loop:
	lodsb

.freeze:
	cmp al, 0
	jz .freeze
	mov bx, 0x0007
	mov ah, 0x0e
	int 0x10
	jmp short .print_and_freeze_loop

.load_boot_code:
	mov [bp+0x10], cl
call read_drive
	jnc .magic_check

.fat_check:
	inc byte [bp+0x10]
	cmp byte [bp+0x04], 0x0b
	je .cholesterol

	cmp byte [bp+0x04], 0x0c
	je .cholesterol
	mov al, [0x0600 + error_loading_ptr]
	jne .print_and_freeze

.cholesterol:
	add byte [bp+0x02], 6
	add word [bp+0x08], byte 6
	adc word [bp+0x0a], byte 0
	call read_drive
	jnc .magic_check
	mov al, [0x0600 + error_loading_ptr]
	jmp short .print_and_freeze

.magic_check:
	cmp word [0x7dfe], 0xaa55
	je .boot_already
	cmp byte [bp+0x10], 0
	je .fat_check
	mov al, [0x0600 + missing_system_ptr]
	jmp short .print_and_freeze

.boot_already:
	mov di, sp
	push ds
	push di
	mov si, bp
	retf

read_drive:
	mov di, 5
	mov dl, [bp]
	mov ah, 8
	int 0x13
	jc .bios_read

	mov al, cl
	and al, 0x3f
	cbw
	mov bl, dh
	mov bh, ah
	inc bx
	mul bx
	mov dx, cx
	xchg dl, dh
	mov cl, 0x06
	shr dh, cl
	inc dx
	mul dx

	cmp [bp+0x0a], dx
	ja .lba_check
	jb .bios_read
	cmp [bp+0x08], ax
	jae .lba_check

.bios_read:
	mov ax, 0x0201
	mov bx, 0x7c00
	mov cx, [bp+0x02]
	mov dx, [bp]
	int 0x13
	jnc .done

	dec di
	jz .done

	xor ah, ah
	mov dl, [bp]
	int 0x13
	jmp short .bios_read

.lba_check:
	mov dl, [bp]
	pusha
	mov bx, 0x55aa
	mov ah, 0x41
	int 0x13
	jc .give_up
	cmp bx, 0xaa55
	jne .give_up
	test cl, 0x01
	jz .give_up
	popa

.lba_read:
	pusha
	push byte 0
	push byte 0
	push word [bp+0x0a]
	push word [bp+0x08]
	push byte 0
	push word 0x7c00
	push byte 1
	push byte 0x10
	mov ah, 0x42
	mov si, sp
	int 0x13
	popa
	popa
	jnc .done

	dec di
	jz .done

	xor ah, ah
	mov dl, [bp]
	int 0x13
	jmp short .lba_read

.give_up:
	popa
	stc

.done:
	ret

invalid_partition:
	db 'Invalid partition table', 0
error_loading:
	db 'Error loading operating system', 0
missing_system:
	db 'Missing operating system', 0

times 0x01b5 - ($-$$) db 0

invalid_partition_ptr:
	db invalid_partition - $$
error_loading_ptr:
	db error_loading - $$
missing_system_ptr:
	db missing_system - $$

serial_number:
	dd 0x55054103
	dw 0

partition_table:
	db 0                ; non-bootable partition
	db 1, 1, 0          ; cylinder/head/sector
	db 0x12             ; configuration partition
	db 0xef, 0xbf, 0x62 ; C/H/S of last sector
	dd 0x0000003f       ; logical block address
	dd 0x008cf6f1       ; size in sectors

	db 0x80             ; bootable partition
	db 0, 0x81, 0x63    ; cylinder/head/sector
	db 0x07             ; Windows NTFS
	db 0xef, 0xff, 0xff ; C/H/S of last sector
	dd 0x008cf730       ; logical block address
	dd 0x041b5bd0       ; size in sectors

	times 16 db 0       ; no third partition
	times 16 db 0       ; no fourth partition

magic:
	dw 0xaa55
mainlen equ $-mbr.main

One final note: While being CLI-less is a bug, it is a small bug. It is not worth re-writing your MBR, in my opinion. On the rare occasion that it freezes your system, just hit the reset button, or power the system down, wait a few moments, and power it up again.

Copyright © 2007 G. Adam Stanislav
All rights reserved.