Weird A20-related behaviour (bug?)

Support and general discussion.
Post Reply
vbdasc
Posts: 37
Joined: Tue 21 Mar, 2017 10:53 am

Weird A20-related behaviour (bug?)

Post by vbdasc »

So, I was trying to experiment with the "DOS=HIGH" setting in Dos, and I encountered some weird behaviour that's bugging me. Or it could be entirely logical, and it might only seem weird due to my ignorance. Basically, I set up a PCem virtual machine, with IBM PC/AT BIOS, with 80286 CPU, with 1Mb extended memory, 2 FDD, 1 HDD, EGA etc. I installed on it IBM PC-DOS 5.00 (from May 1991). The CONFIG.SYS file loads HIMEM.SYS, sets "DOS=HIGH" and nothing else. There is no AUTOEXEC.BAT file. I was able to determine (via the DEBUG tool) that the A20 line is always enabled in this case, and that the HMA area contains a VDISK header and different data from the first 64K of the conventional memory (which is the definitive sign that A20 is indeed enabled). But this short program (anyone can enter it quickly and easily with DEBUG) (numbers are hexadecimal, as in DEBUG)

Code: Select all

CS:100	MOV BX,0
	MOV DS,BX
	MOV BX, FFF0
	MOV ES,BX
	MOV WORD PTR [8],0
	ES:
	MOV WORD PTR [108],1
	CMP WORD PTR [8],0
	JZ 124
	MOV DL,41
	MOV AH,2
	INT 21
CS:124	MOV AH,0
	INT 21
	
prints "A", which means that the comparison has yielded an "non-equal" result, which can only mean that the addresses 0:8 and FFF0:108 are aliased, hence A20 is disabled... but this is impossible!

Even more baffling is that when I run this little program through DEBUG, it behaves normal and as expected. The weird behaviour is only encountered when I run it from COMMAND.COM.

Unfortunately, I have no real 286 machines to test this on real hardware. A data cache could be confused by the A20 line and could give a result similar to this, but... 80286 and IBM PC/AT had no data caches, only an instruction prefetch queue which can do no such tricks. Any thoughts? Is it possible that this is a PCem bug?

Thanks.
User avatar
Arjen42
Posts: 131
Joined: Fri 11 Jun, 2021 3:15 pm

Re: Weird A20-related behaviour (bug?)

Post by Arjen42 »

vbdasc wrote: Thu 24 Mar, 2022 10:49 am Even more baffling is that when I run this little program through DEBUG, it behaves normal and as expected. The weird behaviour is only encountered when I run it from COMMAND.COM.
What do you mean by running it from COMMAND.COM?
User avatar
Arjen42
Posts: 131
Joined: Fri 11 Jun, 2021 3:15 pm

Re: Weird A20-related behaviour (bug?)

Post by Arjen42 »

What happens if you change the code to this?

Code: Select all

CS:100	MOV BX,0
	MOV DS,BX
	MOV BX, FFF0
	MOV ES,BX
	DS:
	MOV WORD PTR [8],0
	ES:
	MOV WORD PTR [108],1
	DS:
	CMP WORD PTR [8],0
	JZ 126
	MOV DL,41
	MOV AH,2
	INT 21
CS:126	MOV AH,0
	INT 21
vbdasc
Posts: 37
Joined: Tue 21 Mar, 2017 10:53 am

Re: Weird A20-related behaviour (bug?)

Post by vbdasc »

Arjen42 wrote: Thu 24 Mar, 2022 3:08 pm What happens if you change the code to this?
Absolutely the same thing, without any differences in behaviour.
Arjen42 wrote: Thu 24 Mar, 2022 3:08 pm What do you mean by running it from COMMAND.COM?
It means directly from the DOS command line :)
Bret
Posts: 16
Joined: Tue 22 Mar, 2022 10:50 pm

Re: Weird A20-related behaviour (bug?)

Post by Bret »

Here's a code snippet/subroutine i use in a "real" program to test whether or not A20 is enabled. This particular code requires a 386 CPU, but could be easily modified to work on a 286 (change the CMPSD to CMPSW, and change the CX counter value for the CMPSW). This code also doesn't potentially change anything in the Interrupt Vector Table (IVT) like yours does.

Code: Select all

;------------------------------------------------------------------------------
;TEST TO SEE IF THE A20 LINE IS ENABLED OR NOT
;Inputs:   CLD already issued
;Outputs:  ZF = Set if A20 is Disabled
;             = Clear if A20 Enabled
;Changes:
;NOTES: This should only called from Real Mode!
;------------------------------------------------------------------------------
TestA20:
  PUSH CX,DI,SI    ;Save used registers
  PUSH DS,ES       ;Save used registers
  XOR  CX,CX       ;DS:SI =
  MOV  DS,CX       ;  0000:0000
  MOV  SI,CX       ;  (start of IVT)
  DEC  CX          ;ES:DI =
  MOV  ES,CX       ;  FFFF:0010
  LEA  DI,[SI+10h] ;  (mirror of IVT if A20 is Disabled)
  LEA  CX,[SI+4]   ;Test 4 DWords (should be enough for a valid test)
  REPE CMPSD       ;Do the test
  POP  ES,DS       ;Restore used registers
  POP  SI,DI,CX    ;Restore used registers
  RET
Also, here is an excerpt from the MS-DOS 7.1 HELP for the A20 option in HIMEM.SYS (I don't have easy access to the DOS 5.0 Help program, but assume it is similar). If you're actually wanting to control the A20 line, that gets much more involved.

Code: Select all

/A20CONTROL:ON|OFF
   Specifies whether HIMEM is to take control of the A20 line even if A20
   was on when HIMEM was loaded. The A20 handler gives your computer access
   to the HMA. If you specify /A20CONTROL:OFF, HIMEM takes control of the
   A20 line only if A20 was off when HIMEM was loaded. The default setting
   is /A20CONTROL:ON.
Two questions:

1) What exactly are you trying to accomplish?
2) Is your problem that doing things in DEBUG doesn't work the same way it does in a real program? All I can say is That is Life. Things don't always work in a debugger the same way they do in "real life". I run into that problem all the time, and sometimes even need to include special provisions in the programs to be able to use them with a debugger.
vbdasc
Posts: 37
Joined: Tue 21 Mar, 2017 10:53 am

Re: Weird A20-related behaviour (bug?)

Post by vbdasc »

Bret wrote: Fri 25 Mar, 2022 1:26 am Here's a code snippet/subroutine i use in a "real" program to test whether or not A20 is enabled. This particular code requires a 386 CPU, but could be easily modified to work on a 286 (change the CMPSD to CMPSW, and change the CX counter value for the CMPSW). This code also doesn't potentially change anything in the Interrupt Vector Table (IVT) like yours does.
Yes, but yours doesn't give 100% guarantee for a correct result. Yes, the likelihood for a false negative result is small, but theoretically it could happen. Even if you test all 65520 bytes of the HMA, there is still no 100% guarantee. About my code, it's not a production code. It's an excerpt from a bigger subroutine that does everything properly: clearing interrupts, restoring the modified interrupt vectors and VDISK header, etc. An excerpt with the purpose of being short and illustrative for posting here. Your code runs as expected, though. No anomalies, But it doesn't solve the question.
Bret wrote: Fri 25 Mar, 2022 1:26 am Also, here is an excerpt from the MS-DOS 7.1 HELP for the A20 option in HIMEM.SYS (I don't have easy access to the DOS 5.0 Help program, but assume it is similar). If you're actually wanting to control the A20 line, that gets much more involved.

Code: Select all

/A20CONTROL:ON|OFF
   Specifies whether HIMEM is to take control of the A20 line even if A20
   was on when HIMEM was loaded. The A20 handler gives your computer access
   to the HMA. If you specify /A20CONTROL:OFF, HIMEM takes control of the
   A20 line only if A20 was off when HIMEM was loaded. The default setting
   is /A20CONTROL:ON.
Sorry, I don't see how is this relevant. I don't want to control the A20 line. I shouldn't and can't, either, because when "DOS=HIGH" the HMA is occupied by DOS, and hence a careless attempt to disable A20 will most probably just lead to a crash.
Bret wrote: Fri 25 Mar, 2022 1:26 am Two questions:

1) What exactly are you trying to accomplish?
2) Is your problem that doing things in DEBUG doesn't work the same way it does in a real program? All I can say is That is Life. Things don't always work in a debugger the same way they do in "real life". I run into that problem all the time, and sometimes even need to include special provisions in the programs to be able to use them with a debugger.
1) My goals, starting with most short-term one: Create an A20 testing tool which always gives the same result as XMS function 07h; understanding how the "DOS=HIGH" mode works; understanding how XMS deals with HMA and the A20 line.

2) No. That the code in question works differently under a debugger than without one is simply an observation. But yes, now I also want to understand the reason why. I'm aware of some anti-debugging techniques from the times of 8088/8086/80286 times, using the instruction prefetch queue, and I understand how they work. In this case however, I see no convincing reason for the difference in behaviour. Now, I want to research all of this for myself, but I need to be sure that PCem doesn't lie to me. That's why I asked this question in the first place.
vbdasc
Posts: 37
Joined: Tue 21 Mar, 2017 10:53 am

Re: Weird A20-related behaviour (bug?)

Post by vbdasc »

I did a more elaborate test, and I think I'm beginning to understand what happens. It seems that when the setting "DOS=HIGH" is present, DOS does some arcane deep wizardry behind the scenes. It seems that when running a program from the command line (or maybe from int 21h function 4Bh) DOS ensures that the A20 line is disabled at program startup, but at the first int 21h call from inside the program, the A20 line gets enabled and stays enabled for the lifetime of the program (so the DOS functions can work, as DOS is in the HMA). This alone can explain the weird behaviour I wrote about. Hence, there is no bug in PCem. It's all by (DOS) design. To be sure, I performed my tests on a PCJS virtual machine, and they behaved identically to PCem. I'm now sure that real hardware would behave the same way.

Thanks for the insights and sorry for doubting this great emulator and for this offtopic thread. Case closed.

P.S. If I call the XMS function 4 (Global Disable A20) and 3 (Global Enable A20) in this sequence before running a program, it starts with A20 enabled, and A20 stays enabled.
Bret
Posts: 16
Joined: Tue 22 Mar, 2022 10:50 pm

Re: Weird A20-related behaviour (bug?)

Post by Bret »

I think disabling A20 at the beginning of the program and then enabling it shortly afterwards might be related to the LOADFIX issue. Supposedly, there are a very small number of programs that "play games" with memory address mathematics and rely/depend on the "wraparound" that happens when the memory offset goes below 0 or above FFFFh. If a program is in the first 64k of memory the program can crash. This problem is exacerbated (or maybe only occurs) when A20 is enabled since the program can accidentally access the HMA. I am not 100% certain of the details, and have not encountered the problem myself.

MS-DOS included the LOADFIX program starting with version 5, which makes sure a program doesn't load in the first 64k of memory. Apparently, the problem was common enough and important enough for MS to create a special program (LOADFIX) to address it. I THINK the reason DOS disables A20 when a program first starts is somehow related to the LOADFIX/wraparound issue, but am not sure. I can't think of any other logical reason for DOS to disable A20 when a program starts.
vbdasc
Posts: 37
Joined: Tue 21 Mar, 2017 10:53 am

Re: Weird A20-related behaviour (bug?)

Post by vbdasc »

Bret wrote: Fri 25 Mar, 2022 7:52 pm I think disabling A20 at the beginning of the program and then enabling it shortly afterwards might be related to the LOADFIX issue. Supposedly, there are a very small number of programs that "play games" with memory address mathematics and rely/depend on the "wraparound" that happens when the memory offset goes below 0 or above FFFFh. If a program is in the first 64k of memory the program can crash. This problem is exacerbated (or maybe only occurs) when A20 is enabled since the program can accidentally access the HMA. I am not 100% certain of the details, and have not encountered the problem myself.

MS-DOS included the LOADFIX program starting with version 5, which makes sure a program doesn't load in the first 64k of memory. Apparently, the problem was common enough and important enough for MS to create a special program (LOADFIX) to address it. I THINK the reason DOS disables A20 when a program first starts is somehow related to the LOADFIX/wraparound issue, but am not sure. I can't think of any other logical reason for DOS to disable A20 when a program starts.
Yes, this is an interesting issue. Enabling A20 line can indeed make some programs not work, or even crash the entire system by clobbering memory they don't own, if these programs rely on the so called "1Mb address wraparound". One very interesting blog, www,os2museum.com , has done some great detective work in finding such programs. Basically, according to that blog, there are several well-known groups of programs relying on the wraparound: for example,

first group: programs using the CP/M-like "CALL CS:0005" method for calling DOS functions, instead of int 21h; such programs are very rare, but apparently one example is the spell-checker utility coming with some early versions of MS Word for DOS; it's also likely that some early versions of programs ported to DOS from CP/M-80 belonged to this group, but for now it seems that they all have been lost to the sands of time. It's the call to CS:0005 itself that uses the wraparound to route to the DOS function handler. Luckily, only few bytes need to be made the same between HMA and low memory to make the call work properly, even when A20 is enabled.

second group: programs built with some early versions of MS/IBM Pascal; this group includes versions of MASM and the Pascal itself, as well as possibly numerous other programs, commercial or not, made before MS/IBM Pascal faded. It's the Pascal startup code that uses the wraparound.

third group: programs built with some early versions of the EXEPACK executable-compressing utility - either standalone or built in the Microsoft LINK. And this group seems to be the most numerous, and the latest of the three. It's the initial uncompressing code that uses the wraparound.

And, of course, there might be programs which rely on the wraparound, without belonging to any of the three groups, although they must be exceptionally rare. When Microsoft created XMS and HIMEM.SYS, the potential problem with this type of software undoubtedly popped up, MS solved this incompatibility problem by requiring in the XMS specification that everyone who touches the A20 line (for using HMA or extended memory, or entering protected mode) must ensure that legacy programs (unaware of A20) run with A20 disabled.

But soon the problem was exacerbated by the appearance of the new DOS 5.0, and more specifically, its new "DOS=HIGH" mode, which moved parts of DOS itself to HMA, and consequently, forced DOS to control the A20 line. Now, in this mode, a mere call to DOS would require A20 to be enabled (so DOS would be even available). As XMS specification states, A20 switching can be rather slow, so constantly turning it on and off before and after every DOS call was out of question for performance reasons, at least. Hence, the user programs had to spent most of their time with A20 enabled (so DOS could work). But how are the aforementioned compatibility problems solved then? It seems that while developing DOS 5.0, Microsoft made sure that they have several lines of defense, some of them created specifically for the DOS=HIGH" mode.

first, when "DOS=HIGH", the aforementioned few bytes needed for correct "CALL CS:0005" execution are set up. Helps the first group.

second, LOADFIX helps the second and the third group, when it's applied manually.

third, when "DOS=HIGH", an arcane technique (exepatch) patches certain programs in memory after they're loaded, but before they're executed, to make them compatible.

fourth, DOS makes sure to start programs with A20 disabled, although later A20 gets enabled; this might help third and maybe second group, if the wraparound-using code is at the very beginning, before using any DOS functions. Perhaps this gives an answer to one of your questions.

All these multiple lines of defense obviously worked well, because DOS 5 is regarded as a highly stable and compatible version, unlike its predecessor, DOS 4.
User avatar
Arjen42
Posts: 131
Joined: Fri 11 Jun, 2021 3:15 pm

Re: Weird A20-related behaviour (bug?)

Post by Arjen42 »

vbdasc wrote: Thu 24 Mar, 2022 3:52 pm
Arjen42 wrote: Thu 24 Mar, 2022 3:08 pm What happens if you change the code to this?
Absolutely the same thing, without any differences in behaviour.
So how would it know to look for [8] in the data segement, e.g. (0:8), and [108] in the extra segment, e.g. fff0:108, in your original program?
vbdasc wrote: Thu 24 Mar, 2022 3:52 pm
Arjen42 wrote: Thu 24 Mar, 2022 3:08 pm What do you mean by running it from COMMAND.COM?
It means directly from the DOS command line :)
Do you mean written to do a .com file instead of running inside the debug program?
vbdasc
Posts: 37
Joined: Tue 21 Mar, 2017 10:53 am

Re: Weird A20-related behaviour (bug?)

Post by vbdasc »

Arjen42 wrote: Sat 26 Mar, 2022 9:52 pm
So how would it know to look for [8] in the data segement, e.g. (0:8), and [108] in the extra segment, e.g. fff0:108, in your original program?
I'm not sure if I understand your question correctly. But in the reference "word ptr[8]" it's assumed that DS:8 is meant. DS is the default, per x86 specification. Also, there is a reason why DS: , ES: etc are called "segment overriding prefixes" instead of "segment setting prefixes". Because they're fully optional, and there are documented default segment registers for every instruction.
Arjen42 wrote: Sat 26 Mar, 2022 9:52 pm

Do you mean written to do a .com file instead of running inside the debug program?
Yes, written as .COM and executed outside of DEBUG. Sorry for the confusion.
User avatar
Arjen42
Posts: 131
Joined: Fri 11 Jun, 2021 3:15 pm

Re: Weird A20-related behaviour (bug?)

Post by Arjen42 »

vbdasc wrote: Sun 27 Mar, 2022 10:06 am I'm not sure if I understand your question correctly. But in the reference "word ptr[8]" it's assumed that DS:8 is meant. DS is the default, per x86 specification. Also, there is a reason why DS: , ES: etc are called "segment overriding prefixes" instead of "segment setting prefixes". Because they're fully optional, and there are documented default segment registers for every instruction.
I know very little about assembly language, so your explanation helped a lot to understand. It makes sense DS is the default, since normally you would not mess with CS or SS, and use the ES only when needed. I didn't realize it was a prefix because it was put on its own line.
Post Reply