In defence of uNESsential

old uNESsential

About this document

uNESsential is an NES emulator written in QBasic, initially released in 1998 and updated again in 2022. This article consists of some more or less coherent memories from the time of the original development, some technical discussion on the emulator, explaining how and why it sucked, how it has been improved in the 2022 release, and how it still mostly sucks, along with general tangential ramblings. Enjoy!

Background

uNESsential began its life in 1998. The emulation scene was thriving and this period of time would later be regarded as the golden age of emulation. I was 16 and had been following the emulation scene for about two years, meaning I had struggled with PasoFami, iNES, NESA and then finally been blown away by NESticle. My understanding of computer hardware was limited to say the least, and my programming experience consisted of a couple of seriously crappy games written in a few different dialects of Basic.

Motivation

Based on sources now only available through the wayback machine it seems my official motivation for writing an NES emulator was to see if music data in NES ROMs was stored in a standardised format, just like the tile data. The answer here, which should be painfully obvious to anyone with even the most basic understanding of NES hardware, is a resounding no, of course not. You idiot! I think I figured that out somewhere along the way, and in the end I never even attempted sound emulation (until now).

I believe another piece of motivation was that I had translated the game Little Nemo: The Dream Master into Swedish as a bit of an experiment in ROM hacking, and there was this one string, "DREAM", appearing at the beginning of each level, that was encoded differently from the rest of the game's script and I never managed to find its location in the ROM. I figured if I had my own NES emulator it should be easy to halt the game at the right moment and see where it was loading the data from. Unfortunately, Little Nemo uses the MMC3 chip and I never attempted any mapper emulation at all. I never did find out where that "DREAM" string was stored.

Realistically, though, the real motivation was that emulation in general was just the coolest thing in the world. I wanted to understand how it worked, and I wanted to be just as cool as my favourite emulator authors.

At this point I had started dabbling in C, using DJGPP and Allegro, but for one reason or another I decided to use QBasic for my NES emulator. In retrospect I think this was a good choice. NESticle was already out and a mediocre emulator written in C would not have garnered a lot of interest from the scene.

Possibly providing additional motivation was the fact that some people in fact did claim that it would be impossible to write an emulator in a language as limited as QBasic. These people obviously had no idea what they were talking about, and even though I would not learn of the term Turing completeness until a couple of years later, I did have an intuitive understanding of it and thought it would be fun to prove these nay-sayers wrong.

Sources of information

As already mentioned, I had basically no idea how a computer really works. I had looked at x86 assembly but could not make sense of it at the time. (Though, in my defence, the x86 architecture really does not make any sense.) The one thing that finally made things click for me was an interview with Y0SHi over at Archaic Ruins, my favourite emulation site at the time. It can still be found online.

It rather meticulously details the behaviour of a couple of 6502 instructions, in language that was on a level basic enough that I could actually wrap my head around it. Armed with this newfound knowledge on the LDA and INC instructions, I felt that I knew how a CPU worked and I could start writing my emulator. It can be noted that, in this interview, Y0SHi consistently uses the term "opcode" when he really means "instruction" and this carried over into my own language, as can be seen in the uNESsential source code and on the old homepage.

When it comes to the NES specifics, uNESsential was based entirely on Y0SHi's NES document, aka nestech.txt. It was not initially public but I have a vague memory of obtaining it early, not from Y0SHi himself but from some other (aspiring) emulator author. I initially looked at Marat's document too, but Y0SHi's seemed more up to date and was, again, worded in a way I could comprehend. I printed the entire thing at the school library and the staff there was not pleased. I tried to calm them down by claiming that it was for a school project. That was not really true at the time, but I sort of made good on that promise later by writing another NES emulator in C for a school project (but that's a different and less interesting story, and that emulator was never released publicly).

Having printed documentation may seem like an archaic practice, and in the current times of multiple monitors and multitasking operating systems it certainly is, but the thing to remember is that this was in early 1998. I had not yet switched to Linux and was running Windows 95, a crash-prone disaster of an OS. Emulators and games still typically targeted DOS beacause it just performed better and didn't crash randomly, meaning I spent a lot of time booted into DOS mode where, unless I had physical documentation, I would have to exit the QBasic IDE anytime I wanted to check the NES docs.

Design

CPU

As a result of my limited understanding of computers, uNESsential employs what you might call a wholly CPU-centric design. To me, a computer was mostly just a CPU, and it could make magical things happen by reading and writing to special registers. The effects of these reads and writes, I figured, could always be calculated on the fly. That a computer consists of multiple chips, all running in parallel, triggered by the same clock signal, had not yet occurred to me. Consequentially, uNESsential essentially consists of a perpetual fetch, decode, execute loop for executing 6502 instructions, and at certain cycle counts a frame of graphics is drawn. Fortunately for me, this simplistic approach will actually get you pretty darn far on semi-old systems like the NES.

PPU

NES emulation in general had started to move away from tile-based rendering, opting instead for line-based (and later pixel-based) rendering to allow for things like raster-effects and split screen scrolling. I can't remember whether I fully understood the difference, but it seemed to me that the only way to draw the graphics with reasonable performance would be to use QBasic's GET and PUT functions to blit entire tiles to the screen, so I decided to go with a tile-based renderer.

An early version of the uNESsential documentation states:

uNESsential does not have a line by line graphics engine, and it
probably never will have, since that almost requires double buffering
or page flipping, which is impossible at decent speed in QBasic (In
VGA).

I have no idea why I thought I would need double buffering to do line-based rendering, but this at least shows that I was aware that line-based rendering was desireable and made a conscious decision to skip it anyway.

Funnily enough, the tile-based renderer I eventually implemented turns out to be somewhat clever and was one of very few positive surprises I found when going through the code 20 years later. More on this later.

Poor choices

Setting aside the fact that QBasic itself is a terrible choice for an emulator, there are a couple of rather obvious mistakes in the original uNESsential code.

CPU Memory map

The 6502 has a 16-bit address bus, for 64 kilobytes of address space. In my little mind it seemed obvious that this should be represented as an array of size 65536. In reality this makes little sense. This is the memory map, taken from nesdev.com:

 $0000-$07FF 2KB internal RAM
($0800-$1FFF Mirrors of $0000-$07FF)
 $2000-$2007 NES PPU registers
($2008-$3FFF Mirrors of $2000-2007)
 $4000-$4017 NES APU and I/O registers
 $4018-$401F APU and I/O functionality that is normally disabled
 $4020-$FFFF Cartridge space: PRG ROM, PRG RAM, mapper registers

So there is 2KB of RAM, a bunch of registers and the rest is cartridge ROM. Although cartridge space is given as $4020-$FFFF, it's usually just $8000-$FFFF, ie 32KB, meaning almost half of the address space is actually unused. Emulating this with a single 64KB array is wasteful for one thing, but it gets completely unworkable when you start considering games with memory mappers. Any time a game switches ROM banks you would have to copy upwards of 16KB of memory from the ROM file into this stupid array. What you really need is a layer of indirection where you decode addresses and access the correct ROM bank depending on the current state of the mapper chip.

In the immortal words of Y0SHi:

If you're going to be programming an emulator in C or C++, please be
familiar with pointers. Being familiar with pointers will help you out
severely when it comes to handling mirroring and VRAM addressing. For
you assembly buffs out there, obviously pointers are nothing more than
indirect addressing -- it's easier to change a 32-bit value than to
swap in and out an entire 64K of data.

This might seem like an unnecessary thing to point out, what kind of programmer doesn't understand pointers? But it was actually completely warranted. The emulation scene was full of silly kids like me trying their hands at things way out of their respective leagues. And no, I was not familiar with pointers, because they do not really exist in QBasic.

The first drawback, wastefulness, never really occurred to me. 64KB didn't sound like much and I was hell-bent on having the entire address space as a single array, feeling that decoding the address and indexing into multiple different arrays depending on its value would be too slow. The second drawback, bank switching, did occur to me (eventually), but I had an excellent solution: I put it off and eventually decided that emulating memory mappers was out of scope for uNESsential.

Sadly, I immediately ran into a limitation of QBasic here. QBasic is a real-mode application, meaning it uses 16 bit registers and sees memory as divided into segments of 64KB each. Arrays are typically indexed using a single 16-bit register, meaning they can at most contain 65536 bytes, and indeed, that is the hard limit for QBasic arrays. At first this sounds perfect, that's just enough for the 6502 address space. The problem is just that QBasic has no single byte datatype. The smallest datatype is INTEGER which is a signed 16 bit value, meaning an array of 65536 elements would require 131072 bytes of memory, or two whole segments. One crappy way around this might be to use an array of 32768 integers and store two bytes in each integer. This would of course slow everything down as some bit-mangling would need to be done for every access to the array (except when accessing properly aligned 16 bit values, but that doesn't happen nearly enough for this to be a good idea).

I chose another, possibly even crappier workaround. I used a file instead of memory.

PPU Memory

The PPU has 16KB of memory which would fit comfortably in a QBasic array, and yet for some reason I still decided to keep it too in a file. I don't remember my reasoning but there is a clue in the source code:

vramposi = vramposi + okn
IF vramposi > 32767 THEN vramposi = vramposi - 32768

This is the code that updates the VRAM pointer after a write to VRAM. Never mind that this is a terrible way to implement wrapping, it suggests that I somehow got into my head that the PPU had 32KB of memory. No problem, you might say, because a 32KB array is still possible. Well, turns out that wasn't quite true. The largest allowed array size is actually 32766!

Palette

The NES will usually not display more than 25 colours on screen, from a total of 56 different colours. The only video mode in QBasic with enough colours to emulate this is "screen 13", otherwise known as mode 13h, ie 320x200 with 256 colours on screen from a total of 262144. However, the NES resolution is 256x240 meaning 40 lines from the NES screen would have to be sacrificed in this mode. I though this was unacceptable and I also wanted to keep some other things on screen, including the pattern tables, so I decided that I would prefer to compromise with the palette and went for "screen 12", ie 640x480 with 16 colours on screen, again from a palette of 262144.

But here is where it gets weird. Although I could only display 16 colours at a time, I had a total of 262144 to choose from, and I could easily pick any 16 colours from the NES palette and assign them to my VGA palette, but that's not what I did. I instead created a 16-colour approximation of the 56 colour NES palette. It's not a great looking palette, but it's sort of a reasonable idea, because then I could just set my VGA palette to this crappy NES palette and the NES games would just look like they had been converted down to 16 colours, right? Yeah, but that's not what I did and it would never have been possible with the way the tile rendering works. Instead, the VGA palette must be reconfigured on every frame to match the current background palette of the NES. If there are repeat colours in the NES background palette, there will be repeats in the VGA palette too, effectively wasting colour entries and lowering the number of possible on-screen colours for that frame. So when the palette is configured dynamically for each frame, why use this crappy 16 colour palette, why not just pick colours from the real NES palette? I can't be fully sure but I think I was trying to be fair to the sprites.

There are four sprite palettes on the NES, each containing three colours (plus transparency), for 12 colours in total. Seeing as the entire VGA palette was already configured to match the NES background palette, the sprites would have to make do with these already configured colours, and I implemented this in the most naive way possible. When a sprite needed a particular colour, it would check the current (VGA) palette, and if the colour was available there, it was reused for the sprite, and if not, the sprite would default to colour entry 0 (the NES background colour, oftentimes black) for that particular colour. And this is where having a 16-colour NES palette kind of pays off. When there are only 16 colours to choose from, it's reasonably likely there will be some overlap between the background and sprite palettes, allowing the sprites to reuse those overlapping colours. With 56 colours, the overlap is likely to be minimal, resulting in mostly black sprites.

Some positives

On the whole, uNESsential is a pretty crummy piece of software. It's obvious from the code that it is written by someone who was learning as they went along, but there are a couple of bright spots, where the code surprised me in a positive way.

6502 emulation

The cpu emulation is surprisingly solid. I ran kevtris's nestest ROM and uncovered the following errors:

Four opcodes were missing: pre and post indexed SBC and EOR. I haven't found a single game that breaks because of this and if anything I'm surprised there weren't more instructions missing seeing as I typically only added features to uNESsential when required by a game. Completeness or accuracy beyond just running games was never a goal.

Non-maskable interrupts placed a slightly incorrect (off by one) return address on the stack. This was also mostly harmless since the RTI instruction handled it correctly (meaning RTI was technically broken too).

BVC/BVS were incorrectly implemented, I had just copied BCC/BCS and forgotten to change which flag they checked. This is a bit of a sad story. The single thing that caused me the most headache in the entire emulator, was the handling of the overflow (V) flag in ADC and SBC (particularly SBC I believe). These days you can easily google it, and it probably feels stupidly simple to anyone who has done some arithmetic on the 6502, but to me back then it was a mystery. I would write test ROMs full of SBC instructions and step through them in LoopyNES's debugger, trying to figure out the logic of the V flag. If nestest can be trusted here, I did finally get it right, but with BVC and BVS being broken, it was all for nought.

Background rendering

The background rendering is the one piece of the code that is slightly clever and actually manages to leverage the QBasic API in a meaningful way. If you are reading this, there is a good chance that you already know how NES graphics work. If not, I would recommend you take a look at a proper article on it, perhaps this one, or this one, but for completeness, and in case those sites disappear, here's a very quick recap:

Graphics are entirely tile-based. Tiles are 8x8 bitmaps with 2 bits per pixel, stored on the cartridge, typically in what's known as CHR-ROM, but sometimes in CHR-RAM. The tiles are grouped into sets of 256 and each set is called a pattern table. Tiles are laid out on the screen by selecting a pattern table and writing tile indices within that pattern table to a piece of VRAM called the name table. The NES resolution is 256x240 pixels, or 32x30 tiles, so one name table is just 32 * 30 = 960 bytes.

So while the name tables and pattern tables alone are enough to cover the screen with tiles, they only produce a 2bpp (ie 4-colour) image, and that's where attribute tables come in. Attribute tables, also located in VRAM, provide an additional 2 bits of colour (the two upper bits) to each 16x16 pixel area on the screen. While the combination of attribute table and pattern table data gives each pixel four bits of colour, the way it's typically thought of is that the pattern table bits select one of four background palettes (for a 16x16 area on the screen) and the pattern table bits select a colour within this palette.

The trade-off here is of course that by adding a tiny bit of extra VRAM (a single attribute table is just 60 bytes) and accepting the restrictions it implies when it comes to how colours can be distributed on the screen, the size of CHR-ROM could be cut in half (assuming the end goal is a 4bpp image). In practice, the number of colours on-screen is not quite 16 because colours 4, 8 and 12 in the background palette are all mirrors of colour 0.

Calculating the colour value of every pixel on the screen individually would be painfully slow using QBasic. uNESsential instead uses the GET and PUT functions to "blit" one tile at a time to the screen. Ideally, the tiles should have been placed in an in-memory array but GET and PUT cannot be used with arrays of bitmaps, so, instead, the entire pattern tables are rendered to an area on the screen right when the ROM is loaded, and when a tile is needed on the emulated TV screen, it is first copied from the on-screen pattern table with GET and then immediately pasted into the correct location with PUT.

Early versions of uNESsential ignored the attribute tables entirely and just drew the tiles using the first four colours of the standard VGA palette. Here, a and b are the actual screen offsets of the tile within the pattern table and x an y are the coordinates of the tile on the NES screen:

GET (a, b)-(a + 7, b + 7), tile
PUT (x, y), tile, PSET

The key thing that made it possible to support the extra colour bits from the attribute tables is the final parameter of the PUT function. In the example above, PSET is used which means that each pixel on the screen is just replaced with whatever is stored in the tile bitmap, but PUT also supports the boolean operations AND, OR and XOR, allowing bitmaps to be blended with pixels already present on the screen. So what later versions of uNESsential do is to first draw the attribute tables to the screen as a bunch of 16x16 rectangles with colours 0, 4, 8 or 12. The pattern table bitmaps can then be PUT on top of these with drawing mode OR to combine the pattern table bits with the attribute bits. It's really quite efficient. For QBasic.

This is the SMB title screen separated into attribute blocks and tiles:

SMB title attributes SMB title tiles

The attribute blocks look a bit messy but it's important to remember that these colours (0, 4, 8, 12) are all one and the same in the NES palette. ORing these two images together produces the following:

SMB title attributes and tiles

And applying the (unfortunately terrible approximation of the) NES palette produces the end result:

SMB title old

Interesting optimisation efforts

QBasic does no optimisation on your code. So while in a language like C, you can introduce temporary variables and break long statements up into multiple smaller ones just for code clarity without affecting performance, in QBasic it comes with a cost. I did not really go out of my way trying to get optimal performance out of uNESsential, but I found this attempt at optimisation that I thought was a bit cute:

cxb = hscroll AND 15
cyb = vscroll AND 15
...
LINE ((cxb XOR 15) + a * 16, (cyb XOR 15) + b * 16)-(..),...

This is the code to draw an attribute block, ie a 16x16 rectangle filled with a single colour. cxb and cyb are the lower 4 bits of the scroll values. So say cxb is 3, that means every attribute block (and every tile) needs to be shifted three pixels to the left when placed on the screen. (The upper bits of the scroll values are what's known as "coarse" scroll and they are used to index within the attribute tables, but do not result in any pixels being shifted).

The xor here caught me completely off guard. It just looks out of place. But what happened here is that the NES screen is located at coordinates 15,15 on the PC screen and I wanted to do 15 - cxb, but I must have realised that subtracting something from a number with all bits set (like 15 which in binary is 1111) is the same as xoring the two numbers, or more generally:

(a & b) = a => (a ^ b) = (b - a)

And, conversely, for addition:

(a & b) = 0 => (a ^ b) = (a | b) = (a + b)

Basically, these are instances of addition and subtraction of binary numbers where no carrying or borrowing is involved. I figured an xor should be quicker than a subtraction and decided to use that. It's an idea with some merit. On a transistor level, xor is a way simpler operation than subtraction, but the sad reality is that they use the same number of cycles on an x86. I wouldn't have known that, but the fact that they are also the same number of cycles on a 6502 should perhaps have been a clue.

Compatibility

The final uNESsential release of the 90s did a reasonable job of emulating mapper 0 games. Most games would at least display a title screen but it was difficult to thoroughly test the games due to how slow they ran. I would get around 2 fps on my Pentium 2 (and even as CPU speeds shot into GHz territory in the early 00s, uNESsential's performance did not improve much). Most importantly, however, Super Mario Bros seemed to run fine (though suffering from the black sky bug that many early NES emulators produced). At that point I started to lose interest. I did attempt a rewrite of the cpu emulation, using nested if-statements to do a binary search for the current opcode instead of a select statement, but I introduced a lot of bugs in the process and never found the motivation to iron them out.

Fixing uNESsential

In early 2021 I discovered QB64, a modern reimplementation of QBasic. It's super-awesome and if you have any interest in QBasic I highly recommend you check it out. It even reimplements the good old QBasic IDE, with some optional improvements like syntax highlighting. Naturally I had to try uNESsential on it and was delighted to find first of all that it just worked right out of the box, but, even more amazingly, that games now ran at way beyond full speed. With no particular goal in mind I started fixing bugs and other things that annoyed me in the old release. There turned out to be quite a bit of low-hanging fruit and a lot could be accomplished with little effort. It actually makes me a little bit sad that I abandoned the project back in 1999. I'm obviously a better developer now but a lot of this stuff I could definitely have pulled off back then too.

I should point out that while QB64 contains a huge number of extensions to modernise QBasic and take advantage of modern hardware, I have no interest in using those. Although I strongly recommend you use the QB64 binaries to run it today, uNESsential remains compatible with QuickBasic 4.5, and a working DOS binary is still provided.

Optimisations

RAM and ROM is now in memory and not on disk. This is probably the single biggest optimisation that was possible.
Datatypes are now declared almost everywhere. This is really QBasic 101: always use INTEGERs where possible, the default datatype is SINGLE and it is super-slow.
Keyboard input is now checked once per scanline rather than after every single instruction.

Bugfixes

Lots. For instance, the missing instructions have been added, BVC and BVS have been fixed, the indirect JMP bug has been implemented, zero-page wrapping for pre and post indexed addressing modes has been added, palette mirroring (the black SMB sky bug) has been implemented.

Other improvements

PPU mirroring

uNESsential deliberately ignored mirroring of name and attribute tables and implemented a full four screen layout in the PPU. This was mostly out of laziness but I also thought mirroring would have too much of a performance impact. Many games do run fine without mirroring because they never access or display the mirrored name tables, but many games break at least to some extent and I just never noticed because, again, testing was really slow.

icehockeyold

Notice the missing crowd row at the bottom on the left.

Background palettes

This is what a reasonable NES palette looks like:

Good NES palette

And this is the monstrosity implemented by the original uNESsential:

Bad NES palette

As already detailed above, there was really no good reason uNESsential couldn't include the full NES palette, it just can't display all of the colours at once. uNESsential now includes a much nicer palette (the one above to be precise) in standard .pal format, meaning you can even replace it yourself, and this is probably the most striking improvement. It kind of looks like a real emulator now!

old SMB title

Sprite palettes

Seeing as the entire VGA palette is consumed by the background layer, the sprites have to make do with whatever colours are present in the background palette. This constraint still holds but instead of the old naive algorithm for picking sprite colours that occasionally resulted in mostly black sprites, the new uNESsential uses sums of squared distances between RGB values to generate a full sprite palette by picking the closest matching colours from the background palette. Again something I contemplated back in the day but it seemed difficult and slow. 17 year old me just did not have a great understanding of things like computational complexity and profiling.

old Balloon Fight

Here, in the new version, the bad guys end up with a slightly too dark shade of blue making them a bit difficult to discern, but overall the new version is pretty close to the real thing whereas the old version is just blatantly incorrect.

Split screen scrolling

This one I always figured was impossible or at least pointless with a tile-based renderer, but it's actually quite simple and effective. The most common use of split screen scrolling is to keep a static status bar somewhere on screen, and for those cases the split will almost always occur between tiles, and usually even between attribute blocks, so all I had to do was sample the scroll registers every 16 lines and then account for those values during rendering. I also finally implemented Loopy's "The skinny on NES scrolling". I couldn't wrap my head around it when it came out, or possibly I just couldn't believe the internal implementation of scrolling was so complicated, I figured I was just misunderstanding the docs.

old SMB SMB

This effect is also way more common than I realised. For example, on the Ice Hockey title screen, those 12 guys below the logo are scrolled in independently of the rest of the screen. It never even occurred to me that they were missing when working on the old uNESsential.

old Ice Hockey title

Racing games that update scroll values on every scanline obviously just look hilarious, but at least they are somewhat playable now.

Slalom

Framerate measuring and limiting

I never measured the framerate of the original uNESsential but it was obviously dystopian, probably around two fps, and there was certainly no need to implement any kind of framerate limiting. With QB64, even the original uNESsential code goes way beyond 60 fps, and with the recent optimisations it easily reaches 300 fps.

Learning to program in QBasic, timing was always something I struggled with, and really there is no good way to do it. You can access the system clock but it's only updated with a frequency of 18.2Hz. In fact, this is not just a QBasic problem, it's a PC hardware quirk and there are ways around it, but a lot of DOS games actually run at 18 fps just because it's convenient.

I would inevitably end up using busy-wait loops to control the pacing of my games. In my early games they had to be configured by the user, but I eventually attempted to calibrate them automatically. But it was all rather handwavy and not something I could use here. As a side note, Nibbles does this, also in a flawed way, which is why it won't run on a reasonably modern machine:

startTime# = TIMER              ' Calculate speed of system
FOR i# = 1 TO 1000: NEXT i#     ' and do some compensation
stopTime# = TIMER
speed = speed * .5 / (stopTime# - startTime#)

The problem here is that due to the low resolution (18.2Hz) of the timer and the relatively short for-loop, stopTime - startTime always becomes 0 and the game crashes with a division by zero error.

Even if you are happy with 18 fps, achieving it in QBasic is unnecessarily complex. The SLEEP function sadly only works with whole seconds. One of my more creative ideas to work around this was to create a delay by calling the SOUND function to play a really high-pitched sound outside of human hearing range for a single timer tick. I would still be limited to 18 fps but it was a simple way to produce a predictable delay. Unfortunately it made the speaker produce horrible noises on some computers. Turns out you can give SOUND a frequency of 0 and it will actually just sleep, but that wasn't documented and I never figured it out (until now).

Anyway, if you've run the new uNESsential with framerate limiting, you will (hopefully) have noticed that it hovers around 60 fps quite reliably. It achieves this using a solution that I think is equal parts beautiful and ridiculous. When uNESsential starts, it spends one second in this for-loop:

future! = TIMER + 1
second = 0
FOR x& = 0 TO 2147483647
    second = second + 1
    IF TIMER > future! THEN EXIT FOR
NEXT x&

This is to figure out how many iterations of the for-loop are needed to waste exactly one second of clock time. Then, after one second of emulation, it figures out how much time it needs to waste on each frame in order to bring the framerate down to 60 fps:

IF fps <> 60 THEN
   waste = waste + second / 60 - second / fps
   IF waste < 0 THEN waste = 0
END IF

Then after every vblank, the following loop runs:

c& = 0
future = TIMER + 1
FOR w& = 0 TO waste
    c& = c& + 1
    IF TIMER > future THEN EXIT FOR
NEXT w&

This loop is deliberately almost identical to the initial one that calculated the second value. The timer check is only there to slow the loop down to exactly same speed as the initial loop (but it has the perfectly reasonable side effect of terminating the loop after one second if the computer has become bogged down or if waste somehow has become unreasonably large).

All in all, it's fittingly stupid, it works, and it would have blown the mind of teenage me. It's actually doubly stupid in that QB64 has better timing functions that would allow me to easily limit the framerate to 60 fps in a more stable way and without wasting cpu cycles, but I can't use them because that would break compatibility with QuickBasic 4.5, and at the same time, the framerate limiting code is completely worthless in QuickBasic 4.5 because QuickBasic 4.5 will never ever get remotely close to running uNESsential at 60 fps. Blah!

Sound

Emulating sound would have been pointless in the original uNESsential given how slowly it ran, but as I was writing this section I realised that it would be perfectly doable now, so I went ahead and implemented it. There are two main ways of producing sound in QBasic, SOUND and PLAY. SOUND can play any frequency, but it locks up the CPU for the duration of the sound and the shortest duration it supports is 1/18th of a second so any use of that would completely ruin the framerate. PLAY is intended for music and takes strings on a specific format, like this:

PLAY "o1l16ab>d<bl8>f#.f#.e.p8.l16<ab>d<bl8>e.e.d.l16c#l8<b"

Interestingly, and very much to my advantage, PLAY can run in the background while the program keeps running and it is actually able to play notes that are shorter than 1/18th of a second. You can specify a tempo between 32 and 255 giving you at most 255 quarter notes per minute, and the shortest supported note length is a sixtyfourth, so 255 * 64 / 4 = 4080 notes per minute or 4080 / 60 = 68 notes per second. Ideally I would have liked to be able to play 180 notes per second which would allow all three channels to play on every frame, but anything above 60 is still acceptable, allowing one channel to play on each frame (and also, playing 180 notes per second would mean that each note gets such a short duration that the lower octaves would not even work, even 60 is probably pushing it). If the upcoming channel is silent, it will allow one of the others to play to avoid unnecessary silence. A PC Speaker produces a square wave, so it is actually a good fit for the two square channels of the NES, but there is no way to change the volume or duty cycle so those settings are ignored. The frequency sweep function is supported but since PLAY is limited to the 12 tone scale, it sounds slightly hilarious.

An interesting consequence of how the PLAY function works is that if the framerate exceeds 68, the music will not be able to keep up with the gameplay and the internal music buffer will start to grow. In classic QB this would not be a problem, the music buffer is tiny and once it is full, calls to PLAY start blocking until there is enough space again. In fact, this could even work as a way to limit the framerate, but in QB64 the music buffer has no size limit, so it would just grow indefinitely and the music would get ever more out of sync with the gameplay. The way uNESsential handles this is that no music is played if the current framerate is above 64.

Remaining issues

Controls

The biggest annoyance by far is how controller emulation works by toggling NES buttons on and off. You might think it was deliberately implemented that way because it makes sense when games are running at 1-2 fps and keeping the keys held down would just get annoying, but it is just as much a result of the notoriously limited keyboard functions of QBasic. The INKEY$ function is almost certainly implemented using the BIOS keyboard functions, meaning that you can detect keypresses but not releases. On the other hand, QBasic does provide PEEK, POKE, INP, and OUT functions that allow you to manipulate memory and ports and accomplish almost anything by talking directly to the PC hardware. I decided when starting the project that I would avoid those functions, because I wanted to stick to the QBasic API, and while I still think that's a reasonable restriction, I'm starting to consider making an exception for the controls. There's really no other way to make the games truly playable.

Mapper emulation

It's just not a good time, for at least two reasons. First, the proper way to handle PRG ROM would be to place the ROM banks in a two-dimensional array. That way it would be easy to index into the correct ROM bank based on the current mapper state. But that just doesn't work. Even a 2 by 16384 array is too much for QBasic to handle. A more realistic approach would be to just place the ROM banks into a bunch of arrays and then use VARPTR and PEEK to do indirect addressing. This would probably be faster and would cut memory usage in half since PEEK does byte addressing, but at this point you're fighting the spirit of QBasic a bit too much for my taste.

And second, the CHR ROM needs to be on-screen at all times. If I sacrificed the entire user interface, I could get an additional six banks on-sreen, for a total of eight, but it would just look gross. Four banks in total would be doable, I guess.

As cool as it would be to run Zelda on this thing, don't hold your breath.

Lack of features

Standard emulator features like save states and screenshots are missing. There is a save function in the code but it's just a stub so I must have thought about it but then run out of steam.

Conclusions

uNESsential was my main spare time project for about a year in my teens. There are man-weeks if not months sunk into the original code. Sadly, I abandoned it in a rather unfinished state. Working on it now, I was surprised to find how little work was actually needed in order to make drastic improvements to it. And while it is unfortunate that I never quite finished the project back then, it's been highly enjoyable to work on it now, so: thank you, 16-year old me, I guess!

uNESsential is of course always going to be a crappy way to experience NES games, and that's fine, but today it is more playable and complete than I ever dreamed it could be when I wrote the original code and I hope its existence amuses you at least a tiny bit as much as it amuses me.