[Patch] Voodoo 3 WinUAE merge notes/patch

Post new patches here!
Post Reply
twilen
Posts: 6
Joined: Fri 14 Aug, 2020 9:21 am

[Patch] Voodoo 3 WinUAE merge notes/patch

Post by twilen »

First, very big thanks for Voodoo 3 emulation!

Voodoo 3 WinUAE merge mostly done (Voodoo 3 is one of boards that is supported by almost all Amiga compatible operating systems and has lots more VRAM than usual boards back from the day when 4M of VRAM was huge), here are some notes:

SST_Status and SST_intrCtrl are also located in 2D register space (banshee_reg_readl)
SST_intrCtrl can be written in 2D register space (banshee_reg_writel) bypassing FIFO.

PCI IO space (third BAR) should support full 32-bit. Latest (but not older ones for some unknown reasons) Amiga Mediator PCI bridge driver assumes upper 16 bits are writable or it gets confused. (write FFFFFFFF, read should return FFFFFF00, not 0000FF00)

Voodoo 3 does support vidProcCfg interlace bit (Documentation says "Interlaced video out enable. For Avenger, since interlaced video out is not supported, this bit should remain 0 all the time."), without this fix interlaced Amiga modes miss bottom half.

FIFO status check can't be right. Amiga driver always hung (waits forever for FIFO to empty) without included changes but I am not sure if it is correct.

VGA vsync support was missing (same with other boards too, never used in PC?)

Obviously Amiga drivers also use byteswap/swizzle bits (emulating these are annoying, only make emulation more complex and slower..) but I assume they are never used with little-endian CPUs. Not included in the patch and implemented in wrapper code.

(and usual C to more strict C++ compilation fixes but these are not that important. Not included in the patch.)

Messy patch:

Code: Select all

diff --git a/pcem/vid_voodoo_banshee.cpp b/pcem/vid_voodoo_banshee.cpp
index 2ac803e5..16d714c5 100644
--- a/pcem/vid_voodoo_banshee.cpp
+++ b/pcem/vid_voodoo_banshee.cpp
@@ -88,6 +88,8 @@ typedef struct banshee_t
         uint32_t desktop_stride_tiled;
 
         int type;
+
+        int vblank_irq;
 } banshee_t;
 
 enum
@@ -148,6 +150,7 @@ enum
 #define VGAINIT0_EXTENDED_SHIFT_OUT (1 << 12)
 
 #define VIDPROCCFG_CURSOR_MODE (1 << 1)
+#define VIDPROCCFG_INTERLACE (1 << 3)
 #define VIDPROCCFG_HALF_MODE (1 << 4)
 #define VIDPROCCFG_OVERLAY_ENABLE (1 << 8)
 #define VIDPROCCFG_OVERLAY_CLUT_BYPASS (1 << 11)
@@ -204,6 +207,31 @@ enum
 #define VIDSERIAL_I2C_SCK_R (1 << 26)
 #define VIDSERIAL_I2C_SDA_R (1 << 27)
 
+static int banshee_vga_vsync_enabled(banshee_t *banshee)
+{
+    if (!(banshee->svga.crtc[0x11] & 0x20) && (banshee->svga.crtc[0x11] & 0x10) && ((banshee->pciInit0 >> 18) & 1) != 0)
+        return 1;
+    return 0;
+}
+
+static void banshee_update_irqs(banshee_t *banshee)
+{
+    if (banshee->vblank_irq > 0 && banshee_vga_vsync_enabled(banshee)) {
+        pci_set_irq(NULL, PCI_INTA);
+    } else {
+        pci_clear_irq(NULL, PCI_INTA);
+    }
+}
+
+static void banshee_vblank_start(svga_t* svga)
+{
+    banshee_t *banshee = (banshee_t*)svga->p;
+    if (banshee->vblank_irq >= 0) {
+        banshee->vblank_irq = 1;
+        banshee_update_irqs(banshee);
+    }
+}
+
 static uint32_t banshee_status(banshee_t *banshee);
 
 static void banshee_out(uint16_t addr, uint8_t val, void *p)
@@ -231,10 +259,21 @@ static void banshee_out(uint16_t addr, uint8_t val, void *p)
                 svga->crtc[svga->crtcreg] = val;
                 if (old != val)
                 {
-                        if (svga->crtcreg < 0xe || svga->crtcreg > 0x10)
+                        if (svga->crtcreg == 0x11) {
+                            if (!(val & 0x10)) {
+                                if (banshee->vblank_irq > 0)
+                                    banshee->vblank_irq = -1;
+                            } else if (banshee->vblank_irq < 0) {
+                                banshee->vblank_irq = 0;
+                            }
+                            banshee_update_irqs(banshee);
+                            if ((val & ~0x30) == (old & ~0x30))
+                                old = val;
+                        }
+                        if (svga->crtcreg < 0xe || svga->crtcreg > 0x11 || (svga->crtcreg == 0x11 && old != val))
                         {
-                                svga->fullchange = changeframecount;
-                                svga_recalctimings(svga);
+                            svga->fullchange = changeframecount;
+                            svga_recalctimings(svga);
                         }
                 }
                 break;
@@ -260,6 +299,8 @@ static uint8_t banshee_in(uint16_t addr, void *p)
                         temp = 0;
                 else
                         temp = 0x10;
+                if (banshee->vblank_irq > 0)
+                    temp |= 0x80;
                 break;
                 case 0x3D4:
                 temp = svga->crtcreg;
@@ -388,6 +429,7 @@ static void banshee_recalctimings(svga_t *svga)
         if (svga->crtc[0x1b] & 0x40) svga->vsyncstart  += 0x400;
 //        pclog("svga->hdisp=%i\n", svga->hdisp);
 
+        svga->interlace = 0;
         if (banshee->vgaInit0 & VGAINIT0_EXTENDED_SHIFT_OUT)
         {
                 switch (VIDPROCCFG_DESKTOP_PIX_FORMAT)
@@ -436,6 +478,15 @@ static void banshee_recalctimings(svga_t *svga)
                         svga->htotal *= 2;
                 }
 
+                if (banshee->vidProcCfg & VIDPROCCFG_INTERLACE)
+                {
+                    svga->interlace = 1;
+                    svga->vtotal *= 2;
+                    svga->dispend *= 2;
+                    svga->vblankstart *= 2;
+                    svga->vsyncstart *= 2;
+                }
+
                 svga->overlay.ena = banshee->vidProcCfg & VIDPROCCFG_OVERLAY_ENABLE;
 
                 svga->overlay.x = voodoo->overlay.start_x;
@@ -759,11 +814,11 @@ static uint32_t banshee_status(banshee_t *banshee)
         uint32_t ret;
 
         ret = 0;
-        if (fifo_size < 0x20)
-                ret |= fifo_size;
+        if (fifo_entries < 0x20)
+                ret |= 0x1f - fifo_entries;
         else
                 ret |= 0x1f;
-        if (fifo_size)
+        if (fifo_entries)
                 ret |= 0x20;
         if (swap_count < 7)
                 ret |= (swap_count << 28);
@@ -1001,6 +1056,14 @@ static uint32_t banshee_reg_readl(uint32_t addr, void *p)
                 voodoo_flush(voodoo);
                 switch (addr & 0x1fc)
                 {
+                        case SST_status:
+                        ret = banshee_status(banshee);
+                        break;
+
+                        case SST_intrCtrl:
+                        ret = banshee->intrCtrl & 0x0030003f;
+                        break;
+
                         case 0x08:
                         ret = voodoo->banshee_blt.clip0Min;
                         break;
@@ -1232,7 +1295,11 @@ static void banshee_reg_writel(uint32_t addr, uint32_t val, void *p)
                 break;
 
                 case 0x0100000: /*2D registers*/
-                voodoo_queue_command(voodoo, (addr & 0x1fc) | FIFO_WRITEL_2DREG, val);
+                    if ((addr & 0x3fc) == SST_intrCtrl) {
+                        banshee->intrCtrl = val & 0x0030003f;
+                    } else {
+                        voodoo_queue_command(voodoo, (addr & 0x1fc) | FIFO_WRITEL_2DREG, val);
+                    }
                 break;
                 
                 case 0x0200000: case 0x0300000: case 0x0400000: case 0x0500000: /*3D registers*/
@@ -2331,8 +2429,8 @@ static uint8_t banshee_pci_read(int func, int addr, void *p)
                 
                 case 0x18: ret = 0x01; break; /*ioBaseAddr*/
                 case 0x19: ret = banshee->ioBaseAddr >> 8; break;
-                case 0x1a: ret = 0x00; break;
-                case 0x1b: ret = 0x00; break;
+                case 0x1a: ret = banshee->ioBaseAddr >> 16; break;
+                case 0x1b: ret = banshee->ioBaseAddr >> 24; break;
 
                 /*Subsystem vendor ID*/
                 case 0x2c: ret = banshee->pci_regs[0x2c]; break;
@@ -2408,12 +2506,22 @@ static void banshee_pci_write(int func, int addr, uint8_t val, void *p)
                 banshee_updatemapping(banshee);
                 return;
 
+                case 0x1a:
+                    banshee->ioBaseAddr &= 0xff00ffff;
+                    banshee->ioBaseAddr |= val << 16;
+                    break;
+                case 0x1b:
+                    banshee->ioBaseAddr &= 0x00ffffff;
+                    banshee->ioBaseAddr |= val << 24;
+                    break;
+
                 case 0x19:
                 if (banshee->pci_regs[PCI_REG_COMMAND] & PCI_COMMAND_IO)
-                        io_removehandler(banshee->ioBaseAddr, 0x0100, banshee_ext_in, NULL, banshee_ext_inl, banshee_ext_out, NULL, banshee_ext_outl, banshee);
-                banshee->ioBaseAddr = val << 8;
+                        io_removehandlerx(banshee->ioBaseAddr, 0x0100, banshee_ext_in, NULL, banshee_ext_inl, banshee_ext_out, NULL, banshee_ext_outl, banshee);
+                banshee->ioBaseAddr &= 0xffff00ff;
+                banshee->ioBaseAddr |= val << 8;
                 if ((banshee->pci_regs[PCI_REG_COMMAND] & PCI_COMMAND_IO) && banshee->ioBaseAddr)
-                        io_sethandler(banshee->ioBaseAddr, 0x0100, banshee_ext_in, NULL, banshee_ext_inl, banshee_ext_out, NULL, banshee_ext_outl, banshee);
+                        io_sethandlerx(banshee->ioBaseAddr, 0x0100, banshee_ext_in, NULL, banshee_ext_inl, banshee_ext_out, NULL, banshee_ext_outl, banshee);
                 pclog("Banshee ioBaseAddr=%08x\n", banshee->ioBaseAddr);
 //                s3_virge_updatemapping(virge); 
                 return;

User avatar
leilei
Posts: 1039
Joined: Fri 25 Apr, 2014 4:47 pm

Re: [Patch] Voodoo 3 WinUAE merge notes/patch

Post by leilei »

I wonder if this emulates Amiga's terrible Quake2 performance with one of those (where there's a vertex lighting hack that doesn't really help matters IIRC). I can't imagine the bus being too great for 3d acceleration (akin to using a laptop's docking station)
twilen
Posts: 6
Joined: Fri 14 Aug, 2020 9:21 am

Re: [Patch] Voodoo 3 WinUAE merge notes/patch

Post by twilen »

leilei wrote: Mon 21 Dec, 2020 1:20 am I wonder if this emulates Amiga's terrible Quake2 performance with one of those (where there's a vertex lighting hack that doesn't really help matters IIRC). I can't imagine the bus being too great for 3d acceleration (akin to using a laptop's docking station)
I don't know..

I don't emulate any timings in expanded configurations because almost all users simply want as fast as possible config when using RTG display boards (with m68k JIT enabled which does not support bus timing anyway). Only mostly unexpended A500/A1200 configs support cycle-accurate mode, where it makes most sense.

I also personally don't understand why anyone would want to run PC ports of programs on Amiga.. Yes, I know it is very popular hobby for some reason :)
(I never had RTG board back in the day. Hitting the hardware directly is the only proper Amiga way!)

btw, does Voodoo 4/5 have any new features or does it "only" support larger VRAM? Just wondering if there is easy way to get even more VRAM without requirement to support some new 3D features (=slower emulation) in Voodoo emulation? Amiga Voodoo driver apparently have support up to Voodoo 5.
User avatar
leilei
Posts: 1039
Joined: Fri 25 Apr, 2014 4:47 pm

Re: [Patch] Voodoo 3 WinUAE merge notes/patch

Post by leilei »

V4 changes several things:
- FXT texture compression (which never really caught on despite 3dfx's best attempts at creating a new open standard, IIRC only Alice and FAKK2 bothered with FXT)
- 32-bit rendering (without any screen filtering)
- Larger texture size limit
- Temporal full scene anti-aliasing (often claimed to be better than Geforce2's FSAA)
- some buffer effect to make some rough motionblur happen for one special outdated test version of quake3 only
- the ability to use multiple VSA chips for more scalability (Voodoo5 hype)
- heavy power requirements (V5)

Later Voodoo3 drivers for Windows (particularly "Voodoo Series") are V3/V4/V5 supported, but doesn't mean the V4 is trivial to emulate from there...
User avatar
SarahWalker
Site Admin
Posts: 2054
Joined: Thu 24 Apr, 2014 4:18 pm

Re: [Patch] Voodoo 3 WinUAE merge notes/patch

Post by SarahWalker »

Also new combine modes (particularly for multitexture), general purpose scan-time buffer accumulation ('T-buffer'), rotated grid sampling for antialiasing...

It's the biggest set of architectural changes in the Voodoo line.
twilen
Posts: 6
Joined: Fri 14 Aug, 2020 9:21 am

Re: [Patch] Voodoo 3 WinUAE merge notes/patch

Post by twilen »

Ok, so it really isn't worth the trouble. 16M is more than enough anyway.
Only Permedia 2 remaining :)
Post Reply