AI Daily Brief - 2026年03月30日

6o6 v1.1: Faster 6502-on-6502 virtualization for a C64/Apple II Apple-1 emulator6o6 v1.1：C64/Apple II 上 Apple-1 模拟器的更快 6502-on-6502 虚拟化

📅 2026-03-29 · 👤 ClassicHasClass (noreply@blogger.com) · 📡 oldvcr.blogspot.com

I'm doing periodic updates on some of my long-term projects, one of them being 6o6, a fully virtualized NMOS 6502 CPU core that runs on a 6502 written in 6502 assembly language. 6o6 implements a completely abstracted memory model and a fully controlled execution environment, but by using the host's ALU and providing a primitive means of instruction fusion it can be faster than a naïve interpreter. This library was something I wrote over two decades earlier for my KIM-1 emulator project for the Commodore 64, and relatively recently I open-sourced and discussed it in detail . It runs on just about any 6502-based system with sufficient memory. For this update I made some efficiency improvements to addressing modes, trimmed an instruction out of the hot path, provided an option for even more control of the 6502 interrupt flag and implemented a faster lane for direct stores to 6502 zero page (as well as the usual custodial and documentation updates). And, of course, any complex library needs a suite of examples, and of course, any update to a complex library demands new examples to play with too. So, given that this year is Apple's 50th anniversary (and, as it happens, my own 50th year of existence personally), what better way to show off a 6502-on-6502 virtualization library than with an Apple-1 emulator ... that runs on the Commodore 64 or Apple II? Now yea, verily, this is hardly the first such example and several others have done something of the sort, but I submit that 6o6 makes our take on it here unique, and as a bit of fun we'll discuss the Apple-1's hardware and look at all that prior 8-bit emulator art for comparison (for the C64 and Apple II and even more exotic systems like the SAM Coupé). Let's get the technical notes out of the way first and then we'll get to the rogues' gallery. In broad strokes, 6o6 provides a virtual machine on your 6502 computer that itself implements a full NMOS 6502 software core (documented instructions only), written in 6502 assembly language. You provide a harness, which is the VM's sole access to guest memory, and a kernel, which acts as a hypervisor. The kernel calls the VM repeatedly, which uses the harness as an interface to access guest memory and executes one (or, if "extra helpings" instruction fusion is on, several) guest instruction(s) from it, returning to the kernel. The kernel then examines the guest processor state and acts upon or modifies it as appropriate, then calls the VM again to run more instructions, and so on. Because the VM only accesses guest memory strictly via the harness, the harness thus becomes a highly flexible virtual memory manager: it alone maps virtual addresses to physical addresses, so it can page things in and out, synthesize address space on the fly and/or throw exceptions and faults back to the kernel. One of our examples runs a guest (with EhBASIC) on a Commodore 64 or 128 entirely from a geoRAM paged RAM expansion, managed by the harness; no part of the guest memory is actually on the computer itself. You can read about 6o6's development in more detail . Although the changes here introduce minor functional improvements with how 6o6 handles IRQs (and by extension BRK s, which the 6502 treats as a software interrupt), this update is primarily a performance one. 6o6 is tightly bound to the xa65 cross-assembler 's macro system, which it uses to inline large sections of code relating to memory access. Although this certainly bloats the VM, the macros also make it substantially faster because overhead from subroutine calls and returns can be avoided in the hot path. The most profitable of these macros are the memory access ones, where every fetch from RAM — because even reading an instruction requires a fetch — can be inlined (you define these too as they are considered part of the harness). There is a special path for instructions that access the 6502 zero page, and new in this release is a fast path for zero page stores as well as loads. Zero page is specially optimized because its physical location in memory can be precomputed and thus reduce the complexity needed to resolve a virtual address. The memory access macros are all optional but are strongly advised as the VM runs rather slower without them. Another class of macros, executed literally on every single guest instruction, are the address mode resolvers. These also call into the harness, using (and inlining) the same memory access macros when available, and after any arguments are fetched do all the math for indexing as required. The addressing mode resolvers are also inlined, so they also need to be quick, and "upon further review" several of the zero page addressing modes were setting up the virtual address results inefficiently: the zero page fast path now makes the work they used to do completely unnecessary, so they were trimmed and combined with shaving an opcode out of instruction dispatch for further savings, a section of code that also must literally run on every single guest instruction. I'm proud of the efficiency gains 6o6 has made over the years I've iterated on it such that big wins are now understandably harder to come by, but these improvements are still measurable. 6o6 is stress-tested in multiple configurations using Klaus Dorman's well-known and community-accepted 6502 validation suite to prove full adherence to the instruction set, after which a count of instructions (as a proxy for relative execution time) is computed. A native 6502 must execute 30,646,178 instructions to pass this suite, which we count using a test framework based on lib6502 . In its fastest configuration, with all optimizations enabled and maximum inlining, 6o6 1.0 executed and passed the suite in 1,602,516,769 instructions. This necessarily includes the veneer harness and kernel used to run the test suite, a average ratio of 52.3 host instructions per guest instruction. 6o6 1.1 can now execute and pass the suite in 1,561,780,659 instructions using the same harness and kernel. Although 2.6% fewer instructions doesn't seem like a big deal, remember that a small percentage of a big number can still be a big number: the improvements bring the average down to 51.0 host instructions per guest instruction, and the processor now requires over 40 million instructions fewer to qualify. The delta increases further for code that does more reading than writing generally, or a lot of zero page activity specifically, since more of those accesses (including stores) can now be moved to a fast path. This calls for a celebration, and as such, it is known that all celebrations must have at least one gratuitous stunt. I'm a nerd and this is mine. Some of you will have seen some of these pictures previously from when I was at the 2019 Vintage Computer Festival West. The Apple-1 needs no introduction and I'm not going to get into the history much in this particular piece, but it is, of course, the nascent Apple Computer Company's first product ever in 1976. Only around two hundred were made and perhaps half of those survive. They were originally intended to use the Motorola 6800 CPU, but it was too expensive, and Steve Wozniak, its designer, developed the system around the new, dramatically cheaper and bus-compatible (surprise!) MOS Technology 6502 instead. Selling for the allegedly totally innocent price of $666.66 [about $3850 in 2026 dollars], all of the units were hand-assembled by Woz, Steve Jobs, and/or their small crew of assistants, and the system was sold until mid-1977 when it was replaced by the much expanded Apple II. Units shipped with 4K of RAM and a complete system could be assembled from the board, a case, an ASCII keyboard and a composite display — no teletype required. Common upgrades included a practically essential cassette adapter (the Apple Cassette Interface card) and an additional 4K of RAM added to the onboard sockets; up to a full 64K was possible through an expansion slot, modulo I/O and ROMs. The small internal ROM monitor (WOZMON) could be supplemented by other languages loaded into RAM, most commonly Wozniak's Integer BASIC. Remaining examples now go for eye-watering amounts of money, even broken or incomplete systems, especially because units sent back to Apple for trade-in value were destroyed. Today, the Apple-1 Owner's Club is here to remind you that you don't have an Apple-1, and neither do I. Still, its historicity is such that for a system very few of us will ever touch, let alone have in our own homes, the Apple-1 has a lot of people interested in it. The attraction is enhanced by hardware so simple to grok that it makes a popular target for emulator authors and replica builders, almost all of whom (myself included) haven't ever touched a real one either. The Apple-1's basic operation can be trivially simulated with a 6502 core and some sort of terminal, which is actually a reasonable basic summary of the system architecture, since it was constructed around a terminal design Woz had built in 1974 out of a keyboard from Sears (remember when Sears sold keyboards? um, remember Sears?) and an off-the-shelf television set. The video display is entirely 40x24 text and centred on seven one-kilobit Signetics 2504 shift registers, six of which hold the character bits for each screen location and the seventh where the cursor is at (a binary-one in its current location). Four of these shift registers have one 74157/74LS157 quad selector/multiplexor and the remaining two have another, while the cursor shift register is supervised by a 74175/74LS175 quad flip-flop. The process of drawing the screen breaks it down into 24 rows. For each row, 40 bits from each main kilobit shift register are clocked into a smaller Signetics 2519 6x40 shift register used as a line buffer (and cycled back into the kilobit registers). This smaller shift register is repeatedly cycled to draw each line of each character, using the bits to index glyph lines stored in a Signetics 2513 character generator. Those lines are fed into an even smaller eight-bit 74166 shift register and clocked out as dots to the screen. A new character can only replace another if the cursor is in the row currently being drawn and at the point where that character would be displayed. This means any one single character can only be added to the screen each frame (60.05fps), and only one character. When this happens, if the character is in the printable range, a write signal to the 74157s causes the character's code bits to be both propagated and loaded into the shift registers, and the cursor moves one bit forward. A "busy bit" can be read by the 6502 to know when the video hardware is ready to accept another character. (If you write to the hardware anyway, the result is officially undefined, though reportedly unhelpful. Ken Shirriff's shift register analysis is instructive.) To turn the new character from 7-bit ASCII (the upper eighth bit is used for the "busy bit" to indicate when the terminal is ready to accept a character) into the 6-bit index of the desired character glyph, one of the bits must be converted. The shift registers only take input from bits 1 (0), 2, 3, 4, 5 and 7 (64). Bits 1-5 are passed through unchanged (to the 2513's A4-A8 inputs via the 2519's I1-I5/O1-O5 lines), but the 2513's A9 input is set to the inverse of bit 7 by the NOR gate at C10 via the 2519's I6/O6 lines. You can see the sequence emitted from this Perl program. % perl -e 'for($i=32;$i<128;$i++){$j = $i & 95; $j |= ($j&64)?0:32; $j &=63; printf(" %02x ",$j);}' 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f For most control characters, they are considered not printable, so nothing happens and they are never inserted into the shift registers (but the per-frame penalty is still paid). These characters are filtered out by the logic chips in positions C5-C9, C12 and D12, which also incorporate whether the cursor is present at that position. Only the carriage return is checked for specifically, by the C6 74LS10 NAND and C5 74LS27 NOR gates, and used to move the cursor to the next line. When the cursor moves off the bottom edge of the screen, a row of blanks pushes the top line out of the main shift registers and the cursor resets. All of this happens in hardware and independently of the 6502. Clearing the screen is even done manually with a button to signal the two 74157/74LS157 quad selector/multiplexers' strobe lines, making them zero the bits to and from their shift registers — the CPU itself is unaware of that process too. Since all the bits in the kilobit shift registers end up clear, bit 5 gets set in the 2519 and no others, which is the space glyph. Because of this design, there is no CPU-accessible video memory and thus no way at all for the CPU to know what's on the display. For I/O the onboard Motorola 6820 PIA provides single locations to emit to the display and read a character from the keyboard (and control registers for each), and that's it. Any program accepting user input must maintain its own command line buffer, which WOZMON and Integer BASIC both do. Otherwise, there are no graphics modes, no colour, and not even reverse or flashing video, with the only thing flashing at all being a small "@" cursor — which is also independently maintained by the hardware, using a 555 wired into the 2519 to cycle it on and off. And how do we know the cursor is an @? The presence of the cursor bit at the current position suppresses the write signal and thus the bits entering the kilobit shift registers, making them zero for that single character position, while the 555's output (gated by an AND gate to make sure the cursor is present) becomes the other input to the same NOR gate at C10. If the 555 has turned the cursor on, the NOR gate emits a zero to the 2519's I6 line and thus to the 2513, and no bits set yields the @ glyph. Otherwise, the NOR gate emits a one to the same line, and as we already know, that becomes a space glyph. The cursor never goes backwards, so it never needs to restore any character. The situation is only slightly complicated by a small jumper area in the middle of the board used to configure the machine's memory map. You can see it in this photo, or zoom in on the Wikipedia picture of the mainboard . Each of the 16 points (0-F) linked to a corresponding 4K region of the 6502's addressing space. The upper four bits of the address bus were decoded by a 74154/74LS154 to those 16 outputs; your jumper then went between those outputs and whatever your enable pin or circuitry was attached to. The board had seven hardwired lines for this, R, S and T (user configurable, exposed on the expansion slot and unconnected from the factory), W and X which were connected by solder jumpers to $1000 and $0000 respectively, Y which was connected by a solder jumper to $F000, and Z which was connected by a small wire to $D000. On the other end W and X went to the chip select lines for whichever 4K banks of RAM were present, Y went to one of the chip enable lines on the WOZMON ROM and Z went to one of the chip select lines on the PIA. The idea was that you could rewire these to whatever was on the included breadboard section of the computer or in the expansion slot, but most software assumed the PIA would be at $D000 and without low RAM and high ROM the system wasn't too useful, so X, Y and Z weren't often changed. On the other hand, a frequent modification was to cut the W jumper and run a new wire from W to $E000, which moved the upper 4K of RAM to just under the ROM region. Integer BASIC, for example, was commonly loaded to that location, and because the change made RAM no longer contiguous, there was less risk of a BASIC program (in the low 4K) inadvertently corrupting it. You should expect for a machine with little comprehensive documentation lots of people know about but a rather tiny minority have actually used that there would be ... varying ideas about its user experience, as it were. I'd like to emphatically insist this is not an indictment or criticism of any of the authors below, just an observation on what I consider an interesting subniche around an exceedingly rare piece of history. Nor is this an exhaustive exposé on all the ways you can emulate an Apple-1; that includes the assorted hardware replicas, some of which can go for pretty stupid money themselves, but are really just emulators to me of a different sort graced by the heady smell of solder. Still, we'll need something to compare against, and a useful test vector for comparison is this one, which comes right out of the Apple-1 manual written by Apple co-founder Ronald Wayne. We'll use it on our various candidate implementations (including my own) to compare their behaviour. As our benchmark, since I sadly have no Apple-1 at Floodgap — I look forward to your offers — MAME has a very competent Apple-1 driver which first originated in a early version of MESS (somewhere between 0.2 and at least 0.37) before their merger. Unlike the likely majority of the other programs we'll discuss in this article, including mine, the MESS/MAME driver looks to have been revised by people either with access to or at least excellent understanding of the actual hardware, first an initial version I can't credit, then some updates by Rodney Hester, then a substantial revision credited to Colin Howell who improved the video emulation and later added cassette support. (Current versions of MAME use a complete rewrite by R. Belmont that nevertheless maintains the correctness of the earlier releases.) On the other hand, the major historical Apple-1 emulators (the actual hardware being practical unobtanium) derived their operation almost entirely from the described behaviour in the manual. Thus, we would expect any good emulation to support some semblance of this sequence, and most Apple-1 emulators at least attempt to run it as written. The devil's of course in the details of what isn't described in the manual, such as how the monitor reacts when characters are typed, which glyphs precisely are displayed, and even what the cursor looks like, all of which we previously labouriously derived from the schematic. This is how a session looks in modern MAME. Up at the top, after the backslash being used as a prompt (printed on reset or pressing ESCAPE), we entered the code in hexadecimal, then printed the contents of those memory locations for verification, then after deliberately backspacing over a line — the underscores, which after completely obliterating what we typed internally makes the monitor move to the next line — we started the program running. You'll notice that there is no way to move the virtual carriage back, so the characters couldn't actually be deleted from the screen when we backed up. You'll also notice no address was specified to run ("R") from. In this case, the fact it works at all is due to a remarkable coincidence: with no argument given, the execution starts from the last address accessed (tracked in a zero page variable called XAML ), which in this case is $000a because it was the last memory location we displayed, which contains $00 because we put it there as part of the final instruction, which is the BRK opcode, so the 6502 pulls the IRQ vector at $fffe from WOZMON, which is ... $0000, and the program duly runs from there. The program trivially disassembles to LDA #0:BACK TAX:JSR $FFEF:INX:TXA:JMP BACK . I'm assuming the use of the X register is just to make increments more straightforward, since the routine at $ffef will display the character in the accumulator once the video hardware has indicated it's "ready" to accept it. When you start it, the accumulator A begins at zero. The first thirteen (0-12) characters are control characters and never actually enter the shift registers. However, printable and unprintable characters alike have the same approximate sixtieth of a second acceptance rate per frame regardless of their ultimate disposition, so there ends up being a brief but noticeable pause where nothing appears. Only when we get to character value 13 do we move to the next line, then again ignore the next eighteen (14-31) characters which are also unprintable (another brief pause), then get into the printable range. Characters 32 to 95 appear with the character shapes you'd expect and at the same rate, though characters 96 to 127 are also printable, just repeats of 64 to 95. After that, because the high bit is never passed to the video hardware, codes 128 to 255 appear (or not) exactly the same as codes 0 to 127, and then X (and then A) overflows to zero. The routine runs forever in this fashion, scrolling the screen as needed, until stopped. Halting the program is usually accomplished with the reset key, which is a non-destructive restart, and puts us back in WOZMON. All of that occurred with a very simple 6502 machine language program calling a simple routine in the monitor which simply twiddles two locations on the PIA, but running the 6502 code is actually the easy (or at least best defined) part, because the obvious thing to do is run 6502 code on ... a 6502 directly, which would run at native speed and completely authentically. The main problem here is there's no MMU or hardware virtualization on a vanilla 6502 CPU (you see where I'm going with this), so you can't prevent it from reading or writing where it thinks the PIAs are, or WOZROM, or anything else like, say, your support code, because the memory map almost certainly won't match. Still, it's a great hack to try and people certainly tried it, though to make it work requires breaking some rules, so the interesting part is which rules and how. The earliest example I can find of an Apple-1 on any 8-bit computer is appropriately enough for the Apple II , written by Mark Stock, and dates to 2006 . Mark appears to indeed have based his emulator on the manual, so some gaps shouldn't be surprising: for example, it uses a block cursor, since most people would have supposed that's what it had, and it moves the carriage back with deletions since most people would have supposed that's how it worked. Later patches were offered to adjust. Although the files are not on the Wayback Machine, at least one version is on Asimov . This emulator displays text on the Apple II high-res screen so that the whole of $0000-$3fff can be used by the emulated Apple-1, including zero page and stack, and then also loads a patched version of WOZMON into $ff00 on the language card RAM so that calls to it work too and gives us proper 6502 high memory vectors at the same time. The reason for patching WOZMON is the PIAs, which is how this emulator "breaks the rules." Although the PIA's address range at $d000 is available in this configuration, the Apple II has no routine interrupts where a service routine could regularly pick up or deposit values, and even if there were, the potential for race conditions should be glaringly obvious. Instead, the patches turn PIA loads and stores into calls into its support routines: an STA $D012 which would write a character to the screen would become JSR $805B , including in the routine at $ffef, and an LDA $D011 which would read a character from the keyboard would become JSR $8070 . If your program wasn't using WOZMON, then you'd need to patch it yourself to do these things. Mark did a version of Lunar Lander as an example. Here's our test vector output. The stock Apple II doesn't have free-running timers either, so it isn't possible to slow down the video output to the Apple-1 framerate, nor does the cursor blink (and since we're on the hi-res screen, you can't cheat with flashing characters). The Apple DELETE key isn't mapped to anything and just emits a nonsense character, but underscore (i.e., SHIFT-minus) is treated as delete. Both the delete moveback and the extra linefeeds were corrected with later patches, though the wrong bits on the 96-127 character range were not. If anything, the top speed of this emulator is slightly too fast. The Apple-1 6502 ran at a nominal rate of 1.0227 27 MHz, divided down by fourteen from the single 14.318 18 MHz (1260/88) crystal also used for video generation, but RAM refresh imposes a four of every 65 cycles' penalty for an effective throughput of 0.9598MHz. On the Apple II, the system runs at the full /14 speed except for every 65th cycle which is lengthened by two ticks of the same master crystal, yielding an average speed of 1.020484MHz. Understandably and unavoidably you can hose the emulator (and, for that matter, WOZMON) by overwriting its code, and on a system with an NMOS 6502, jam opcodes like $02 will still cause the system to hang up until it's reset. However, one interesting thing is that on an Apple II+ (NMOS from the factory), CONTROL-RESET becomes like pressing the Apple-1's Reset button and the behaviour actually matches. This doesn't work on the IIe and up, where it's just CONTROL-RESET and you exit to BASIC (but you can CALL 32768 to get back in, and $02 is a two-byte no-op on CMOS 65C02s anyway). Simon Owen's Apple-1 emulator appears to be the next in line, as I can't find another one predating his first version in 2007, but he didn't do it on a 6502 — he did it on a Z80, or the SAM Coupé to be precise, arguably the most advanced system of the classic ZX Spectrum family. Simon explained to me that his emulator was written to model the emulation in Pom1, written by Verhille Arnaud, an early Apple-1 emulator which seems to have had its first public release in 2000. Since Pom1 was written in Java, you can still run it, and some versions of it are preserved on the Wayback Machine. The last release I can find is 0.70. Pom1 itself seems to have taken cues from what I believe is the earliest Apple-1 emulator on any platform, written for the Macintosh. This is Sim6502, originally developed in 1997 by Achim Breidenbach, and is still downloadable from a veritable Garden of Macintosh sites or the Wayback Machine . The readme accompanying it suggests Breidenbach also wrote it based on the description of the system in the manual, though he made some different choices: among others Sim6502 advances the cursor on control characters (effectively rendering them as blanks) and doesn't show the second repeated set of characters from 96 to 127 except as blanks too, because without looking at the schematic the bit conversion wouldn't have been obvious. Although it does try to correctly model the character output rate, its 6502 emulation also has some notable deficiencies, such as lacking decimal mode entirely (documented in the readme) and failing to run the IRQ vector via BRK with R (you need to kick off our example with an explicit 0R , or else it gets treated like a soft reset). Reportedly Breidenbach was working on a later version that addressed some of these gaps but I can't seem to find any evidence it was released. Despite a better 6502 emulator and video improvements, Pom1 nevertheless mirrored some of Sim6502's idiosyncracies in its own emulation core like the block cursor, though these varied from version to version. For example, 0.62 had a working delete key that moved the carriage back, but required you to have the caps lock down. By contrast, 0.70 correctly only does capital letters, but its delete key just advances the cursor and is not recognized by WOZMON as delete (you're supposed to press SHIFT-MINUS to get the underscore instead). The later version also incorporates the same or similar monitor that intercepts the BRK our test sequence executes, which 0.62 didn't. Both versions also don't suppress control characters when emitted to the PIA's output port, which again render as blanks, though it does properly render the 96-127 range. (Many of these issues were subsequently fixed, including in the native SDL-based version maintained by John Corrado. Appropriately enough, Mark Stock later ported SDL Pom1 to Mac OS X .) Simon's release is based on the earlier Pom1, e.g., block cursor, Delete actually deletes, control characters advance the cursor. The current version also shows lowercase in the 96-127 range for our test vector, and likewise implements the same or similar monitor to Pom1 0.70 that traps BRK . But Simon's emulator is particularly notable because it must implement its 6502 core in Z80 assembly language — because it's running on a Z80. Since it doesn't run Apple-1 programs on-processor, it needs no adjustments to run anything, and is similarly immune to many catastrophic faults. For example, if you try running a jam opcode, this emulator just ignores it. It also includes not only Integer BASIC, but also Applesoft BASIC, and programs seem to work there as well. Simon is working on improvements to the emulator core to bring it up to the standard in MAME/MESS, but you can play with it now or in the SimCoupe emulator . It's open source, too. Of course, running the code effectively under an interpreter does impose an obvious speed penalty. The SAM Coupé has a 6MHz Z80 and as such Simon estimates its 6502 emulator runs at about 20% of the speed of a real Apple-1, though the emulator's overall performance is at or near native speed when printing because of the ~60Hz limit. Fast-forward a few years, and with the more accurate MAME driver the extremely narrow subniche of 8-bit Apple-1 emulators got more accurate too — though now different rules needed breaking. People might be surprised to know you could simulate the Apple II, or at least some portion of Applesoft BASIC, on the Commodore 64 (monstrosities like the Spartan Mimic notwithstanding, though I'd still love to play with one), and I seem to recall at least a couple others in that vein. But in 2013, Alexei Eeben decided to do it for the Apple-1, incorporating his own monitor program C'mon for the VIC-20 which he cross-ported, dubbing it Green Delicious . Green Delicious also runs native, directly on the Commodore 64's 6510 CPU, which to anyone with a knowledge of the C64's architecture should immediately highlight a big problem: the 6510 reserves the lowest two locations of RAM ($0000 and $0001) for its on-chip I/O port. That means if we even try to type in our sample vector ... ... it will crash, and crash hard, as soon as it stores the first byte or two. Any program that expects to use or start there will simply not work. Thus we need to run our test from another location, say $0300, and from there it works as expected and shows the right character set at the correct approximate speed. IMHO Green Delicious is without a doubt one of the prettiest Apple-1 emulators on any platform, not just 8-bit, from its pixel correct output (you can change the colour with F3 if you don't like green) to the flashing rainbow Apple logo (you can restore the classic @ with F5). In fact, the only minor flaw I could find in its appearance is that the cursor blink rate is a bit off — the 555 has a period of about half a second, two-thirds of which is the on-phase. Green Delicious uses its own variation of patching to handle PIA access, and its pre-patched internal version of WOZMON also has a live IRQ vector which it uses for housekeeping tasks. (This vector doesn't appear to handle BRK instructions, though, which will also make it crash.) Unlike our earlier Apple II-based emulator, however, Green Delicious' built-in loader will patch your binary on the fly , and log what got patched to memory where so you can see what what it did. Surprisingly, this automatic patcher works for a lot of binaries without modification. BASIC programs run fine, and since the interpreter runs at or faster than a real Apple-1, they run at a ripping speed. Indeed, this is Green Delicious' strongest turf. On the other hand, if you try to run native code it's also rather easy — even unintentionally — to crash the emulator and invariably the 64 itself, and after a couple hard stops with a real 64 and a real 1541 disk drive you're going to have a real conniption waiting for it to reload again. So let's go back to Simon's idea from the SAM Coupé of running the Apple-1's 6502 code in emulation, but this time we'll use a virtualization library for the 6502 itself that reuses the 6502's own ALU for speed and correctness on guest code, yet completely isolates memory and maintains full control of the guest processor state at all times. Hmm, I wonder where we could find such a library?? Think think think! ... VA1, for Virtual Apple-1, will run on the Commodore 64 or an Apple II+ or better with at least 48K; here we're running it on an emulated Commodore SX-64. To handle the screen we use a custom character set based on the Signetics' font shapes and set the screen to 24-row mode, starting everything on the second line so scrolling "just works." The Commodore version LOAD s and RUN s like a BASIC program and enters WOZMON immediately. You can tell we're now isolated from the main CPU because we can store our test program right at $0000 ... ... and run it, and not only does it not crash, it runs successfully and identically to MAME. Since the C64's jiffy clock (maintained by the Timer A interrupt) is roughly the same speed as the Apple-1's video hardware, we sync to it to clock characters out and flash the cursor at the right time. This is the same basic idea we used in Oblast except there we also intentionally overdrove Timer A to speed up the game. The cursor is maintained by checking the jiffy clock on every cycle of the virtual machine and changing the character code appropriately. We simply use the Kernal's character out routine here which effectively maintains its own, merely disabled, cursor on each screen's logical line. We use this position to draw our own, removing it if we are going to a new line or about to scroll the screen, and for a little extra paranoia ensure that every logical screen line in the Kernal's screen editor map corresponds exactly to the same physical screen line. VA1 has key equivalents for the Apple-1's special functions like other emulators. The ESCAPE key is sent with F1 and you can use INST/DEL to delete (or press the backarrow for an underscore, which is the same ASCII value). For resetting the guest Apple-1, the Commodore 64 version uses the RESTORE key; I somehow found it appropriate to generate an NMI on the host computer to generate a reset signal on the guest. Just tap it and the virtual machine will safely dump you back into WOZMON with memory intact. 6o6 can throw exceptions and VA1 will handle them. Here, if you run an illegal instruction (undocumented NMOS instructions are intentionally not supported, sorry) like a jam opcode, the 64 won't crash. Instead, 6o6 throws an illegal instruction exception, VA1 prints a message (you know it's coming from the emulator because it appears instantaneously), and puts you right back into WOZMON pointing to the offending address. For that matter, you can't overwrite WOZMON from inside the emulator either. Note that because VA1's harness synthesizes a full 64K guest address space and 6o6 hypercalls are disabled, the only exception you'd get on a practical basis would indeed be an illegal instruction. VA1 is 109 blocks long and intentionally uncompressed so that you can overwrite it in place with anything you load from disk. It provides an 8K RAM system; by default it comes with Integer BASIC in the high 4K RAM at virtual address $E000 and Hamurabi [sic] loaded into the low 4K RAM at virtual address $0000. If you type E2B3R to warm-start Integer BASIC and then type RUN , the game starts. Running BASIC programs is admittedly where the processor emulation drags noticeably. Simon's claim of 20% native for his SAM Coupe 6502 emulator is based on a processor that clock-for-clock (6MHz versus 1.0227 27 MHz [NTSC] or 0.98524861 1 MHz [PAL], but Z80 instructions on average take more clock cycles) is about two or three times faster than the C64's and has more registers, and is running a purpose-written emulator instead of the more generalized one here. While the harness and kernel in VA1 are very thin, they do add a non-trivial cost, and when no display output is being generated VA1 is probably around ten to twenty times slower than a real Apple-1 depending on the instruction mix. When emitting characters, however, both emulators run roughly at full speed because now the video hardware becomes the rate-limiting factor. Because VA1 is uncompressed, you can hit RUN/STOP-RESTORE (not just RESTORE) and load something over it, then re- RUN the program. You can even load it before you run it. The low 4K bank at virtual address $0000 is located at physical address $1000, and the high 4K bank at virtual address $E000 is at physical address $2000. Here, we'll load an assembly language version of Lunar Lander (files ending in .sa have a Commodore-style starting address for your convenience) to virtual address $0300 (physical address $1300), return to VA1 by running it, ... ... and with 300R you can play an unmodified copy of Lunar Lander at a speed indistinguishable from a real Apple-1. (I think.) I decided to do it this way instead of building in a formal interface because firstly I'm lazy, and secondly it makes a better demonstration example if there isn't a lot of additional code. If you want an artistic Apple-1 emulator on your Commodore 64 with more glitz, you should just run Green Delicious. But you can't run Green Delicious on an Apple II, whereas you can run VA1. It runs as a binary program with BRUN . As with Mark Stock's earlier Apple-1-on-an-Apple-II emulator, and because the stock Apple II lacks a free-running clock, this emulator does not limit text output to 60.05Hz either. As for the blinking cursor, that's just because I chose a blinking @ for the cursor as generated by the Apple II's video hardware, since once again there's no clock source to sync it to. Either way, on the Apple II as well we are able to enter our test program ... ... and run it. The Delete key, if you have one, also maps to the underscore or you can just hit SHIFT-minus as usual. With the right cursor key (not used by the Apple-1), you can perform a non-destructive reset of the guest Apple-1, like hitting RESTORE in the C64 version. If you have an up key on your keyboard, that will clear the screen (or press CONTROL-K). The Apple II version also uses the built-in ROM routines to print and scroll the screen, and uses similar code to the C64 version to position its own cursor. The CPU core is otherwise exactly the same and throws the same exceptions, and fails safe in the same way. You also have the same Hamurabi program loaded into the low 4K ... ... and you can overwrite VA1's memory with your own programs from disk here too. Hit CONTROL-RESET and the binaries on the provided disk image can simply be BLOAD ed in place. The Apple II version has the same emulated memory layout, with the low 4K ($0000) at physical address $1000 and the high 4K ($E000) at physical address $2000. We load Lunar Lander, then warm start the emulator with CALL 2051 ... ... and then run it with 300R . More software is available from Apple1Software.com , though you will need to encode the correct starting address within VA1 (for the binary or PRG file) on both platforms. Lastly, here it is running on a real Commodore SX-64, providing as much, if not more, portability as the original Apple-1 (though it's a bit heavier). If someone wants to port this to Atari 8-bit (well within its capabilities), toss a pull request my way and I'll have a look. So happy birthday, Apple. While working on the upgrades to 6o6, I started thinking about future ways to continue to reduce the impact of memory abstraction on the core, since there aren't many more ways to make the processor emulation itself quicker. For the next release of 6o6, I would like to create a fast path for stack pushes and pulls — this wasn't a major need for the Incredible KIMplement because there wasn't a great deal of memory in the KIM-1 for storing subroutines (let alone calling them), but larger programs certainly make use of them, and may do so frequently. This can probably be done with a new set of macros, so it shouldn't be technically complex. But the other thing I'd like to consider is some sort of dynamic pagetable, effectively a declarative harness. Since the pagetable would already have the deferenced physical location of a virtual page in it, the VM could just go fetch it, and a flag value could tell the VM when the harness needs to be directly consulted instead — at which point the harness still handles the load/store operation as it does now, but it could also make any adjustments and fix up the pagetable on demand, or even throw an exception. (Hey, we just reinvented the page fault!) I need to do some more thinking about this and ideally how to implement it without wrecking the current API, but it could substantially simplify memory access for the typical case, so I think it'd be worth it. Watch for these and other changes in the future. In the meantime, you can look at the changes for v1.1 in the 6o6 Github project or download updated demonstration disk images and programs for the C64 and Apple II from the Releases tab , which includes VA1. 6o6 is released under the Floodgap Free Software License. Additionally, the Incredible KIMplement — the original KIM-1 emulator on the Commodore 64 for which 6o6 was originally written — is also updated to the new processor core, plus changes to the build system for better future maintainability. You can get that from the KIMplement page , or browse the source code on Github . It too is released under the Floodgap Free Software License.

我正在对我的一些长期项目进行定期更新，其中一个就是 6o6，一个完全虚拟化的 NMOS 6502 CPU 核心，用 6502 汇编语言编写，运行在 6502 上。6o6 实现了一个完全抽象的内存模型和一个完全受控的执行环境，但通过利用宿主 CPU 的 ALU 并提供一种原始的指令融合手段，它可以比朴素的解释器更快。这个库是我二十多年前为 Commodore 64 的 KIM-1 模拟器项目写的，最近我才开源并详细讨论了它。它可以在几乎任何有足够内存的 6502 系统上运行。这次更新，我对寻址模式做了一些效率改进，从热路径中删减了一条指令，提供了对 6502 中断标志的更多控制选项，并实现了一个更快的直接存储到 6502 零页的通道（以及通常的维护和文档更新）。当然，任何复杂的库都需要一套示例，当然，对复杂库的更新也需要新的示例来配合。所以，鉴于今年是苹果 50 周年纪念（而且，碰巧也是我个人存在的第 50 年），还有什么比用一个 Apple-1 模拟器——运行在 Commodore 64 或 Apple II 上——来展示 6502-on-6502 虚拟化库更好的方式呢？……好吧，这当然不是第一个这样的例子，也有其他人做过类似的事情，但我认为 6o6 让我们的方案独一无二，作为一点乐趣，我们将讨论 Apple-1 的硬件，并与所有先前的 8 位模拟器艺术进行比较（C64 和 Apple II 的，以及更奇特的系统如 SAM Coupé）。让我们先把技术说明搞清楚，然后再看那些花里胡哨的展示。从广义上讲，6o6 在你的 6502 电脑上提供了一个虚拟机，该虚拟机本身实现了一个完整的 NMOS 6502 软件核心（仅限有文档说明的指令），用 6502 汇编语言编写。你提供一个 harness，这是 VM 访问 guest 内存的唯一通道；你提供一个 kernel，这是 hypervisor。kernel 反复调用 VM，VM 使用 harness 作为访问 guest 内存的接口，并从中执行一条（或者，如果"额外融合"指令融合开启的话，多条）guest 指令，然后返回给 kernel。kernel 然后检查 guest 处理器状态，并相应地执行操作或修改它，然后再调用 VM 来运行更多指令，以此类推。因为 VM 严格通过 harness 访问 guest 内存，harness 因此成为一个高度灵活的虚拟内存管理器：它单独将虚拟地址映射到物理地址，所以它可以在需要时将内容换入换出，动态合成地址空间，和/或向 kernel 抛出异常和故障。我们的一个示例在 Commodore 64 或 128 上运行一个 guest（带 EhBASIC），完全从 geoRAM 分页 RAM 扩展中运行，由 harness 管理；guest 内存没有任何部分实际位于本机电脑上。你可以更详细地阅读 6o6 的开发过程。虽然这里的改动引入了 6o6 处理 IRQ 方式的细微功能改进（进而延伸至 BRK，6502 将其视为软件中断），但这次更新主要是性能方面的。6o6 与 xa65 交叉汇编器的宏系统紧密绑定，它用这个宏系统在热路径中内联大量与内存访问相关的代码。虽然这确实会让 VM 膨胀，但宏也让速度大大提升，因为可以避免子程序调用和返回的开销。这些宏中最有利可图的是内存访问宏，每次从 RAM 读取——因为即使读取一条指令也需要一次 fetch——都可以被内联（你也要定义这些，因为它们被认为是 harness 的一部分）。有一条针对访问 6502 零页指令的特殊路径，新版本还为零页存储以及加载提供了一条快速路径。零页经过了特殊优化，因为它在内存中的物理位置可以预先计算，从而减少解析虚拟地址所需的复杂性。内存访问宏都是可选的，但强烈建议使用，因为没有它们 VM 运行起来会相当慢。另一类宏，每一条 guest 指令都会literally执行一次，是地址模式解析器。这些也调用 harness，使用（并内联）相同的内存访问宏（如果有的话），在获取任何参数后，做索引所需的所有数学运算。地址模式解析器也是内联的，所以它们也需要很快，"经过进一步审查"，几种零页寻址模式在设置虚拟地址结果时效率很低：零页快速路径现在使它们以前所做的工作变得完全不必要的，所以它们被精简并与从指令分派中删除一个操作码相结合，以进一步节省，这个代码段也必须literally在每一条 guest 指令上运行。我为 6o6 这些年来我所做的效率提升感到自豪——随着迭代，现在大获全胜当然越来越难，但这些改进仍然是可衡量的。6o6 在多种配置下使用 Klaus Dormann 著名的、业界公认的 6502 验证套件进行压力测试，以证明对指令集的完全遵守，之后计算指令数（作为相对执行时间的代理）。原生 6502 必须执行 30,646,178 条指令才能通过这个套件，我们使用基于 lib6502 的测试框架来计数。在最快配置下，启用所有优化并最大程度内联，6o6 1.0 在 1,602,516,769 条指令中执行并通过该套件，其中必然包含了用于运行测试套件的 veneer harness 和 kernel，平均比率为每 guest 指令 52.3 条 host 指令。6o6 1.1 现在使用相同的 harness 和 kernel 在 1,561,780,659 条指令中执行并通过该套件。虽然减少 2.6% 的指令听起来不是大事，但记住大数字的一小部分仍然是大数字：改进将平均值降低到每 guest 指令 51.0 条 host 指令，而且处理器现在需要减少超过 4000 万条指令来达标。对于一般读操作多于写操作的代码，或者具体有大量零页活动的代码，这个改进会进一步增大，因为更多这些访问（包括存储）现在可以移到快速路径上。这值得庆祝，而且众所周知，任何庆祝都必须至少有一个 gratuitous 的噱头。我是个书呆子，这就是我的。有些人可能之前在 2019 年复古电脑节西区见过其中一些照片。Apple-1 无需介绍，我不打算在这篇特定文章中过多讲述历史，但它当然就是 1976 年苹果电脑公司的第一个产品。生产了大约 200 台，可能有一半仍然存在。它们原本打算使用 Motorola 6800 CPU，但太贵了，设计师 Steve Wozniak 围绕新的、明显更便宜且总线兼容的（意想不到的！）MOS Technology 6502 开发了这个系统。以据称完全无辜的 666.66 美元价格出售 [约 2026 年的 3850 美元]，所有机器都是由 Woz、Steve Jobs 和/或他们的小助手团队手工组装的，该系统销售到 1977 年中，被大幅扩展的 Apple II 所取代。出货时配备了 4K RAM，组装一个完整系统需要主板、机箱、ASCII 键盘和复合显示器——无需电传打印机。常见升级包括几乎必不可少的磁带适配器（Apple Cassette Interface 卡）和一块额外的 4K RAM 添加到主板插座上；通过扩展槽最多可支持 64K，但需减去 I/O 和 ROM。通过在内置 ROM 监控程序（WOZMON）上补充其他加载到 RAM 的语言，最常见的是 Wozniak 的 Integer BASIC，可以进行扩展。保存至今的剩余样品即使破损或不完整也要价惊人，尤其是被退回苹果折价的机器都已经被销毁。今天，Apple-1 所有者俱乐部提醒你，你没有 Apple-1，我也没有。尽管如此，它的的历史意义使得尽管我们中很少有人能够接触到它，更不用说拥有它在家中，Apple-1 仍然有很多人感兴趣。其吸引力因其硬件如此简单易懂而增强，它成为模拟器作者和复制品制造商的热门目标，几乎所有这些人（包括我自己）也从未接触过真正的机器。Apple-1 的基本操作可以用一个 6502 核心和一些终端来简单地模拟，这实际上是系统架构的一个合理基本概要，因为它围绕 Woz 在 1974 年用一台西尔斯百货（Sears）的键盘（还记得西尔斯卖键盘吗？嗯，还记得西尔斯吗？）和一台现成的电视机搭建的终端设计。视频显示完全以 40x24 文本为中心，以七个 1 千比特 Signetics 2504 移位寄存器为基础，其中六个保存每个屏幕位置的字符位，第七个保存光标当前位置（其当前位置为二进制 1）。四个移位寄存器各配有一个 74157/74LS157 四路选择器/多路复用器，剩下的两个共用另一个，而光标移位寄存器由一个 74175/74LS175 四路触发器管理。绘制屏幕的过程将其分解为 24 行。对于每一行，每个主千比特移位寄存器的 40 个比特被时钟输入到一个更小的 Signetics 2519 6x40 移位寄存器中作为行缓冲器（并循环回传到千比特寄存器）。这个更小的移位寄存器被反复循环来绘制每个字符的每一行，用这些位来索引存储在 Signetics 2513 字符发生器中的字形行。这些行被送入一个更小的八位 74166 移位寄存器，并以点的形式时钟输出到屏幕。新字符只有在光标位于当前正在绘制的行中、且在该字符将被显示的位置时，才能替换另一个字符。这意味着任何一个字符每帧（60.05fps）只能向屏幕添加一次，而且只能是一个字符。当这种情况发生时，如果字符在可打印范围内，对 74157 的写入信号会导致该字符的字符位被同时传播和加载到移位寄存器中，光标向前移动一位。6502 可以读取一个"忙位"来知道视频硬件何时准备好接受另一个字符。（如果你还是不按规矩写硬件，结果官方说是未定义的，不过据报道是没有帮助的。Ken Shirriff 的移位寄存器分析很有启发性。）为了将新字符从 7 位 ASCII（最高位用于"忙位"，表示终端何时准备好接受字符）转换为所需字形的 6 位索引，其中一个位必须被转换。移位寄存器只接受来自第 1（0）、2、3、4、5 和 7（64）位的输入。第 1-5 位原封不动地传递（通过 2519 的 I1-I5/O1-O5 线路到 2513 的 A4-A8 输入），但 2513 的 A9 输入由 C10 的 NOR 门通过 2519 的 I6/O6 线路设置为第 7 位的反相。你可以从这个 Perl 程序看到发出的序列。对于大多数控制字符，它们被认为是不可打印的，所以什么也不会发生，它们永远不会进入移位寄存器（但每帧的代价仍然要付）。这些字符被 C5-C9、C12 和 D12 位置的逻辑芯片过滤掉，这些芯片也结合了光标是否在该位置。只有回车被特别检查，由 C6 74LS10 NAND 和 C5 74LS27 NOR 门处理，用于将光标移动到下一行。当光标移出屏幕底部边缘时，一行空白会将顶行推出主移位寄存器，光标重置。所有这些都发生在硬件层面，独立于 6502。清除屏幕甚至是用按钮手动完成的，向两个 74157/74LS157 四路选择器/多路复用器的选通线路发出信号，使它们将往返移位寄存器的位清零——CPU 本身对这个过程也是不知情的。因为千比特移位寄存器中的所有位最终都被清空，第 5 位在 2519 中被设置而其他位没有，这就是空格字形。由于这种设计，CPU 无法访问视频内存，因此 CPU 完全无法知道屏幕上显示了什么。对于 I/O，机载 Motorola 6820 PIA 提供了单个位置来向显示器发出输出和从键盘读取字符（以及每个的控制寄存器），就这样。否则，没有图形模式，没有颜色，甚至没有反相或闪烁视频，唯一闪烁的是一个小"@"光标——这也是由硬件独立维护的，使用一个 555 芯片连接到 2519 以循环开关它。光标为什么是 @？当前置位的游标位在当前位置时，会抑制写入信号，从而抑制进入千比特移位寄存器的位，使该单个字符位置为零，而 555 的输出（通过与门门控以确保光标存在）成为 C10 相同 NOR 门的另一个输入。如果 555 已经打开光标，NOR 门向 2519 的 I6 线路发出零，从而向 2513 发出指令，没有位设置就产生 @ 字形。否则，NOR 门向同一线路发出一，由于我们已经知道，这就变成了空格字形。光标永远不会后退，所以它永远不需要恢复任何字符。情况只是因为主板中间有一个小跳线区域而稍微复杂了一点，用于配置机器的内存映射。主板有七条硬连线用于此目的，R、S 和 T（用户可配置，在扩展槽上暴露，工厂未连接）、W 和 X，分别焊接到 $1000 和 $0000，Y 焊接到 $F000，Z 用一根小线连接到 $D000。另一端 W 和 X 连接到存在的任何 4K RAM 块的芯片选择线，Y 连接到 WOZMON ROM 上的一个芯片使能线，Z 连接到 PIA 上的一个芯片选择线。这个想法是你可以重新连接这些到你所包含的面包板区域或扩展槽上的任何东西，但大多数软件假设 PIA 会在 $D000，没有低 RAM 和高 ROM 系统不会太有用，所以 X、Y 和 Z 不经常更改。另一方面，一个常见的修改是切断 W 跳线，将一根新线从 W 接到 $E000，将上面的 4K RAM 移到 ROM 区域下方。例如，Integer BASIC 通常加载到那个位置，因为这个变化使 RAM 不连续，BASIC 程序（在低 4K 中）意外破坏它的风险就更小了。你应该可以预期，对于一台几乎没有全面文档但很多人知道但极少数人真正使用过的机器，用户体验会存在……各种不同的想法。我想强调这不是对下面任何作者的指责或批评，只是对我认为在一个极其罕见的历史文物周围一个有趣的亚文化现象的观察。这也不是对所有你可以模拟 Apple-1 的方式的详尽揭露；其中包括各种硬件复制品，其中一些本身就要花很多钱，但实际上对我来说它们只是不同种类的模拟器，沐浴在焊锡的芳香中。尽管如此，我们需要一些东西来比较，并且一个有用的比较测试向量是这个，它直接来自苹果联合创始人 Ronald Wayne 编写的 Apple-1 手册。我们将把它用在我们的各种候选实现（包括我自己的）上，以比较它们的行为。作为我们的基准，因为我悲惨地在 Floodgap 没有 Apple-1——我期待你的提议——MAME 有一个非常称职的 Apple-1 驱动程序，最初源自早期版本的 MESS（大约在 0.2 和至少 0.37 之间，在它们合并之前）。与我们将在本文中讨论的其他程序（包括我自己的）可能的大多数不同，MESS/MAME 驱动程序看起来是由有接触或至少对实际硬件有出色理解的人修订的，先是初始版本我无法署名，然后是 Rodney Hester 的一些更新，然后是 Colin Howell 的实质性修订，他改进了视频仿真，后来又添加了磁带支持。（当前版本的 MAME 使用 R. Belmont 的完整重写，但仍保持早期版本的正确性。）另一方面，主要的历史 Apple-1 模拟器（实际硬件已经成为 practical unobtanium）几乎完全基于手册中描述的行为进行操作。因此，我们预计任何好的模拟都会支持这种序列的某种形式，大多数 Apple-1 模拟器至少会尝试按原样运行它。当然，细节在于手册中未描述的内容，比如监控程序对输入字符的反应，哪些字形被精确显示，甚至光标是什么样子的，所有这些我们之前都是从原理图费力推导出来的。这就是现代 MAME 中的会话样子。顶部，反斜杠用作提示符（在重置或按 ESCAPE 时打印），我们输入了十六进制代码，然后打印了这些内存位置的内容进行验证，然后在故意退格删除一行后——下划线，它在我们内部完全删除了我们输入的内容后将监控程序移到下一行——我们开始运行程序。你会注意到没有办法将虚拟打字机倒回去，所以当我们备份时，字符实际上无法从屏幕上删除。你还会注意到没有指定运行地址（"R"）。在这种情况下，它之所以有效，是因为一个非凡的巧合：当没有给出参数时，执行从最后访问的地址开始（在名为 XAML 的零页变量中跟踪），在这种情况下是 $000a，因为我们最后显示的内存位置，它包含 $00，因为我们把它放在了最后一条指令中，而那是 BRK 操作码，所以 6502 从 WOZMON 的 IRQ 向量 $fffe 拉取，那就是……$0000，程序就从那里运行了。这个程序简单地反汇编为 LDA #0:BACK TAX:JSR $FFEF:INX:TXA:JMP BACK。我假设使用 X 寄存器只是为了使增量更直接，因为 $ffef 处的例程会在视频硬件指示"准备好"接受字符后将累加器中的字符显示出来。当你启动它时，累加器 A 从零开始。前 13 个字符（0-12）是控制字符，实际上永远不会进入移位寄存器。然而，可打印和不可打印的字符具有相同的每帧大约六十分之一秒的接受速率，无论它们的最终处置如何，所以最终会有一个短暂但明显的暂停，什么都不出现。只有当我们得到字符值 13 时才移动到下一行，然后再忽略接下来的 18 个（14-31）字符，这些也是不可打印的（另一个短暂暂停），然后进入可打印范围。字符 32 到 95 以你期望的字形出现，速率相同，尽管字符 96 到 127 也是可打印的，只是 64 到 95 的重复。之后，因为高位从不被传递到视频硬件，代码 128 到 255 与代码 0 到 127 的显示（或不显示）完全相同，然后 X（然后 A）溢出到零。例程以这种方式永久运行，根据需要滚动屏幕，直到停止。通常用重置键停止程序，这是一个非破坏性重启，将我们放回 WOZMON。所有这些都是用一条非常简单的 6502 机器语言程序调用监控程序中的一个简单例程来完成的，该例程只是在 PIA 上的两个位置进行操作，但实际上运行 6502 代码实际上是最简单的（或者说定义最好的）部分，因为显而易见的事情是在……6502 本机上运行 6502 代码，这将以前所未有的速度，完全真实地运行。主要问题在于 vanilla 6502 CPU 上没有 MMU 或硬件虚拟化（你知道我要说什么），所以你无法阻止它读取或写入它认为 PIA 在的地方，或 WOZROM，或任何其他东西，比如你的支持代码，因为内存映射几乎肯定不会匹配。不过，这是一个很棒的 hack，人们当然尝试过，要让它工作需要打破一些规则，所以有趣的部分是哪些规则以及如何打破。我能找到的最早在任何 8 位电脑上运行 Apple-1 的例子，恰好是在 Apple II 上，由 Mark Stock 编写，可追溯到 2006 年。Mark 似乎确实基于手册构建了他的模拟器，所以一些差距不足为奇：例如，它使用块状光标，因为大多数人会假设这就是它所拥有的，而且它用删除功能将打字机后退，因为大多数人会假设这就是它的工作方式。后来提供了补丁进行调整。尽管文件不在 Wayback Machine 上，但至少有一个版本在 Asimov 上。这个模拟器在 Apple II 高分辨率屏幕上显示文本，以便整个 $0000-$3fff 可以被模拟的 Apple-1 使用，包括零页和堆栈，然后将补丁版本的 WOZMON 加载到语言卡 RAM 的 $ff00，这样对它的调用也能工作，同时也给我们正确的 6502 高内存向量。补丁 WOZMON 的原因是 PIA，这是这个模拟器"打破规则"的方式。虽然 PIA 的地址范围 $d000 在这种配置下可用，但 Apple II 没有空闲定时器中断，服务程序可以定期拾取或放置值，即使有，潜在的比赛条件也是显而易见的。相反，补丁将 PIA 加载和存储变成对其支持例程的调用：STA $D012（本来会向屏幕写入一个字符）会变成 JSR $805B，包括在 $ffef 的例程中，LDA $D011（本来会从键盘读取一个字符）会变成 JSR $8070。如果你的程序没有使用 WOZMON，你就需要自己修补它来做这些事情。Mark 用 Lunar Lander 作为示例。以下是我们的测试向量输出。stock Apple II 也没有自由运行的定时器，所以不可能将视频输出放慢到 Apple-1 的帧率，光标也不会闪烁（而且因为我们在 hi-res 屏幕上，你不能用闪烁字符来作弊）。Apple 的删除键没有映射到任何东西，只是发出一个无意义的字符，但下划线（即 Shift-减号）被视为删除。删除后退和额外的换行都在后来的补丁中进行了更正，但 96-127 字符范围的错误位没有更正。如果有什么的话，这个模拟器的最高速度有点太快了。Apple-1 6502 以标称 1.0227 27MHz 运行，从也用于视频生成的单个 14.318 18MHz（1260/88）晶体分频 14 倍得到，但 RAM 刷新对每 65 个周期中的 4 个施加了惩罚，有效吞吐量为 0.9598MHz。在 Apple II 上，系统以完整的 /14 速度运行，只是每第 65 个周期被同一个主晶体的两个刻度延长，平均速度为 1.020484MHz。可以理解的是，无法避免地，你可以用覆盖它的代码来破坏模拟器（而且，对于这一点，WOZMON），在 NMOS 6502 上，jam 操作码如 $02 仍然会导致系统挂起直到重置。然而，一件有趣的事情是在 Apple II+（出厂为 NMOS）上，CONTROL-RESET 变得像按下 Apple-1 的重置按钮一样，行为实际上是匹配的。在 IIe 及以上版本中，这不起作用，那里只是 CONTROL-RESET，你退出到 BASIC（但你可以 CALL 32768 返回，$02 在 CMOS 65C02 上是一个两字节 no-op）。Simon Owen 的 Apple-1 模拟器似乎是下一个，因为我找不到他 2007 年第一个版本之前的另一个，但他没有在 6502 上做——他在 Z80 上做的，或者更准确地说是在 SAM Coupé 上，这是经典 ZX Spectrum 系列中最先进的系统。Simon 告诉我，他的模拟器是为了模拟 Pom1 中的模拟而写的，Pom1 是 Verhille Arnaud 写的一个早期 Apple-1 模拟器，似乎在 2000 年有了第一个公开版本。因为 Pom1 是用 Java 编写的，你仍然可以运行它，其中一些版本保存在 Wayback Machine 上。我能找到的最后一个版本是 0.70。Pom1 本身似乎也借鉴了我认为是最早在任何平台上编写的 Apple-1 模拟器，那是为 Macintosh 编写的。这是 Sim6502，最初由 Achim Breidenbach 于 1997 年开发，至今仍可从大量 Macintosh 网站或 Wayback Machine 下载。附带的自述文件表明 Breidenbach 也是基于手册中对系统的描述来编写它的，尽管他做出了一些不同的选择：除其他外，Sim6502 在控制字符上推进光标（有效地将它们呈现为空格），也不会显示 96 到 127 的第二组重复字符，只显示为空格，因为不查看原理图，位转换不会很明显。虽然它确实尝试正确模拟字符输出速率，但其 6502 模拟也有一些明显的缺陷，例如完全缺少十进制模式（自述文件中有说明），并且用 R 运行 IRQ 向量失败（你需要用明确的 0R 启动我们的示例，否则它会被视为软重置）。据报道 Breidenbach 正在开发一个解决这些差距的后期版本，但我似乎找不到它被发布的任何证据。尽管有更好的 6502 模拟器和视频改进，Pom1 在其自身的模拟核心中反映了一些 Sim6502 的特质，如块状光标，尽管这些因版本而异。例如，0.62 有一个有效的删除键，可以将打字机后退，但需要你按住 caps lock。比较起来，0.70 正确地只做大写字母，但其删除键只是前进光标，WOZMON 不认为它是删除（你应该按 SHIFT-MINUS 得到下划线）。后期版本还包含了相同或类似的监控程序，拦截我们测试序列执行的 BRK，而 0.62 没有。两种版本在发送到 PIA 输出端口时也不抑制控制字符，这些也会呈现为空格，尽管它确实正确呈现了 96-127 范围。（这些问题后来在 John Corrado 维护的本机 SDL 版本中得到修复。恰当地说，Mark Stock 后来将 SDL Pom1 移植到了 Mac OS X。）Simon 的版本基于早期版本的 Pom1，如块状光标、Delete 实际删除、控制字符推进光标。当前版本在我们测试向量中显示 96-127 范围的小写，同样实现了与 Pom1 0.70 相同或相似的监控程序来捕获 BRK。但 Simon 的模拟器特别值得注意的是，它必须用 Z80 汇编语言实现其 6502 核心——因为它运行在 Z80 上。因为它不在处理器上运行 Apple-1 程序，所以不需要调整就能运行任何东西，同样也不会受到许多灾难性故障的影响。例如，如果你尝试运行 jam 操作码，这个模拟器只是忽略它。它还不仅包含 Integer BASIC，还包含 Applesoft BASIC，程序在那里似乎也能正常工作。Simon 正在改进模拟器核心以达到 MAME/MESS 的标准，但现在你可以使用它或在 SimCoupe 模拟器中使用。它也是开源的。当然，在解释器下有效运行代码确实会施加明显的速度惩罚。SAM Coupé 有一个 6MHz Z80，因此 Simon 估计其 6502 模拟器大约以真实 Apple-1 速度的 20% 运行，尽管因为 ~60Hz 限制，模拟器的整体性能在打印时达到或接近本机速度。快进几年，借助更准确的 MAME 驱动程序，8 位 Apple-1 模拟器的极窄子类别也变得更加准确——不过现在需要打破不同的规则。人们可能会惊讶地知道你可以在 Commodore 64 上模拟 Apple II，或者至少模拟 Applesoft BASIC 的某些部分（除了斯巴达式的 Mimic 这样的怪物之外，尽管我仍然很想玩一个），我记得在那个领域至少还有几个。但 2013 年，Alexei Eeben 决定为 Apple-1 做这件事，用他自己的监控程序 C'mon 为 VIC-20 开发并交叉移植，称之为 Green Delicious。Green Delicious 也是本机运行的，直接在 Commodore 64 的 6510 CPU 上运行，对于任何了解 C64 架构的人来说，应该立即突出一个大问题：6510 保留了 RAM（$0000 和 $0001）最低两个位置的芯片内置 I/O 端口。这意味着如果我们尝试输入我们的示例向量……它会在存储第一个或第二个字节时崩溃，而且是崩溃得很厉害。任何期望使用或从那里启动的程序将根本无法工作。因此我们需要从另一个位置（比如 $0300）运行我们的测试，从那里它按预期工作并显示正确的字符集和正确的大致速度。个人看法，Green Delicious 无疑是所有平台上最好的 Apple-1 模拟器之一，不仅仅是 8 位平台，从像素精确的输出（你可以用 F3 改变颜色，如果你不喜欢绿色）到闪烁的彩虹苹果标志（你可以用 F5 恢复经典的 @）。事实上，我在其外观上发现的唯一小缺陷是光标闪烁率有点不对——555 的周期大约是半秒，三分之二是导通相位。Green Delicious 使用自己的变通方法来处理 PIA 访问，它预修补的内置 WOZMON 版本也有一个活动的 IRQ 向量，它用这个向量进行内务处理。（这个向量似乎不处理 BRK 指令，那也会让它崩溃。）然而，与我们早期基于 Apple II 的模拟器不同，Green Delicious 的内置加载器会动态修补你的二进制文件，并记录什么被修补到内存中的什么位置，以便你看到它做了什么。令人惊讶的是，这个自动修补程序在很多二进制文件上无需修改就能工作。BASIC 程序运行良好，而且因为解释器以或高于真实 Apple-1 的速度运行，它们的运行速度非常快。确实，这是 Green Delicious 最强的领域。另一方面，如果你尝试运行本机代码，也很容易——甚至是无意地——崩溃模拟器，而且往往是 64 本身，在用真正的 64 和真正的 1541 磁盘驱动器进行几次硬停止后，你会真正崩溃，等待它重新加载。所以让我们回到 Simon 在 SAM Coupé 上的想法，即在仿真中运行 Apple-1 的 6502 代码，但这次我们将使用一个 6502 虚拟化库来复用 6502 自身的 ALU 以提高速度和正确性，同时完全隔离内存并在所有时候都保持对 guest 处理器状态的完全控制。嗯，我想知道我们能在哪里找到这样的库？？想想想想！……VA1，即 Virtual Apple-1，将运行在 Commodore 64 或至少 48K 的 Apple II+ 上；这里我们在模拟的 Commodore SX-64 上运行它。为了处理屏幕，我们使用基于 Signetics 字形形状的定制字符集，并将屏幕设置为 24 行模式，一切从第二行开始，这样滚动"就好了"。Commodore 版本像 BASIC 程序一样 LOAD 和 RUN，并立即进入 WOZMON。你可以看出我们现在与主 CPU 隔离了，因为我们可以将测试程序存储在 $0000……运行它，它不仅没有崩溃，而且成功运行了，与 MAME 的结果相同。由于 C64 的 jiffy 时钟（由 Timer A 中断维护）与 Apple-1 的视频硬件大致相同步，我们用它来将字符输出和光标闪烁同步到正确的时间。这与我们在 Oblast 中使用的基本思路相同，只是我们在那里故意让 Timer A 超速运行来加快游戏。光标是通过在虚拟机的每个周期检查 jiffy 时钟来维护的，并相应地更改字符代码。我们只是在这里使用 Kernal 的字符输出例程，它有效地维护自己的（禁用的）光标在每个屏幕逻辑行上。我们使用这个位置来绘制我们自己的光标，如果我们要去新的一行或即将滚动屏幕，就移除它，并额外偏执地确保 Kernal 屏幕编辑器映射中的每个逻辑屏幕行与相同的物理屏幕行完全对应。VA1 有 Apple-1 特殊功能的键等效项，如其他模拟器。ESCAPE 键用 F1 发送，你可以用 INST/DEL 删除（或按退格键得到下划线，这是相同的 ASCII 值）。要重置 guest Apple-1，Commodore 64 版本使用 RESTORE 键；我偶然发现产生一个 NMI 在宿主计算机上产生重置信号来在 guest 上产生重置信号是合适的。只需轻触它，虚拟机就会安全地将你放回 WOZMON，内存完好。6o6 可以抛出异常，VA1 会处理它们。在这里，如果你运行一条非法指令（未记录的 NMOS 指令故意不支持，抱歉），比如 jam 操作码，64 不会崩溃。相反，6o6 抛出一个非法指令异常，VA1 打印一条消息（你知道它来自模拟器，因为它立即出现），并将你直接放回 WOZMON 并指向违规地址。更重要的是，你也不能从模拟器内部覆盖 WOZMON。请注意，因为 VA1 的 harness 合成了一个完整的 64K guest 地址空间且 6o6 hypercall 被禁用，实际上你可能得到的唯一异常就是非法指令。VA1 长 109 个块，故意不压缩，这样你就可以在你从磁盘加载的任何东西上原地覆盖它。它提供了一个 8K RAM 系统；默认情况下，它在虚拟地址 $E000 的高 4K RAM 中配备 Integer BASIC，在虚拟地址 $0000 的低 4K RAM 中配备 Hamurabi [原文如此]。如果你输入 E2B3R 来热启动 Integer BASIC，然后输入 RUN，游戏就开始了。运行 BASIC 程序时，处理器模拟的拖沓是显而易见的。Simon 声称他的 SAM Coupe 6502 模拟器为 20% 本机速度是基于一个每时钟周期（6MHz 对 1.0227 27MHz [NTSC] 或 0.98524861 1MHz [PAL]，但 Z80 指令平均需要更多时钟周期）大约快两到三倍且有更多寄存器的处理器，而且是专门编写的模拟器，而不是这里更通用的模拟器。虽然 VA1 中的 harness 和 kernel 非常精简，但它们确实增加了相当大的成本，而且在没有显示输出生成的情况下，VA1 大约比真实 Apple-1 慢十到二十倍，这取决于指令组合。然而，当发出字符时，两个模拟器都大致以全速运行，因为现在视频硬件成为速率限制因素。因为 VA1 未压缩，你可以按 RUN/STOP-RESTORE（不仅仅是 RESTORE）并在其上加载一些东西，然后重新 RUN 该程序。你甚至可以在运行它之前就加载它。虚拟地址 $0000 的低 4K 银行位于物理地址 $1000，虚拟地址 $E000 的高 4K 银行位于物理地址 $2000。这里，我们将 Lunar Lander 的汇编语言版本加载到虚拟地址 $0300（物理地址 $1300），然后通过运行它返回 VA1……然后用 300R，你可以以与真实 Apple-1 无法区分的速度玩一个未修改的 Lunar Lander。（我觉得。）我决定这样做的原因是，首先我很懒，其次如果没有很多额外的代码，它会成为一个更好的演示示例。如果你想要一个更华丽的 Commodore 64 上的 Apple-1 模拟器，你只应该运行 Green Delicious。但你不能在 Apple II 上运行 Green Delicious，但你可以运行 VA1。它作为二进制程序运行，使用 BRUN。和 Mark Stock 早期在 Apple II 上运行 Apple-1 的模拟器一样，因为 stock Apple II 也没有自由运行的时钟，这个模拟器也不会将文本输出限制在 60.05Hz。至于闪烁的光标，那只是因为我选择了 Apple II 视频硬件生成的闪烁 @ 作为光标，因为再次，没有时钟源可以同步它。无论哪种方式，在 Apple II 上我们同样能够输入我们的测试程序……并运行它。Delete 键，如果有的话，也映射到下划线，或者你可以像往常一样按 SHIFT-minus。用正确的光标键（Apple-1 不使用的那个），你可以对 guest Apple-1 执行非破坏性重置，就像在 C64 版本中按 RESTORE 一样。如果你的键盘有向上键，那将清除屏幕（或按 CONTROL-K）。Apple II 版本也使用内置 ROM 例程来打印和滚动屏幕，并使用与 C64 版本类似的代码来定位自己的光标。CPU 核心在其他方面完全相同，抛出相同的异常，并以相同的方式故障保护。你也有相同的 Hamurabi 程序加载到低 4K……你可以在这里用你自己的程序从磁盘覆盖 VA1 的内存。按 CONTROL-RESET，提供的磁盘映像上的二进制文件可以简单地 BLOAD 到原位。Apple II 版本具有相同的模拟内存布局，低 4K（$0000）位于物理地址 $1000，高 4K（$E000）位于物理地址 $2000。我们加载 Lunar Lander，然后用 CALL 2051 热启动模拟器……然后用 300R 运行它。更多软件可从 Apple1Software.com 获取，但你需要为二进制文件或 PRG 文件在 VA1 中编码正确的起始地址（在两个平台上）。最后，它在真正的 Commodore SX-64 上运行，提供了与原始 Apple-1 一样多的（如果不更多的话）便携性（虽然它更重一点）。如果有人想把它移植到 Atari 8 位（完全在它的能力范围内），向我发起 pull request，我会看看。现在，生日快乐，苹果。在开发 6o6 升级时，我开始思考进一步减少内存抽象对核心影响的方法，因为我不太可能在使处理器模拟本身更快方面有太多办法了。对于下一版本的 6o6，我想创建一个用于堆栈推送和拉取的快速路径——对于 Incredible KIMplement 来说这不是主要需求，因为 KIM-1 没有大量用于存储子程序（更不用说调用它们）的内存，但更大的程序肯定会使用它们，而且可能经常使用。这可能可以通过一组新的宏来完成，所以技术复杂性不应该很高。但另一件我想考虑的事情是某种动态页表，有效地成为一个声明式 harness。因为页表已经有了虚拟页面的解除引用的物理位置，VM 可以直接去获取它，而一个标志值可以告诉 VM 什么时候需要直接咨询 harness——在这种情况下，harness 仍然像现在一样处理加载/存储操作，但它也可以按需进行调整和修复页表，甚至抛出异常。（嘿，我们刚刚重新发明了页错误！）我需要在这方面做更多思考，最好是考虑如何在不破坏当前 API 的情况下实现它，但这可能会大大简化典型情况的内存访问，所以我觉得这是值得的。敬请期待这些和其他变化。与此同时，你可以在 6o6 Github 项目中查看 v1.1 的变化，或从 Releases 选项卡下载 C64 和 Apple II 的更新演示磁盘映像和程序，其中包括 VA1。6o6 根据 Floodgap Free Software License 发布。另外，Incredible KIMplement——最初的 Commodore 64 KIM-1 模拟器，6o6 最初就是为它编写的——也已更新到新的处理器核心，加上构建系统的更改以提高未来的可维护性。你可以从 KIMplement 页面获取，或在 Github 上浏览源代码。它也根据 Floodgap Free Software License 发布。

🔗 查看原文 →

🤖 AI Daily Brief

openai-structured-outputs-are-really-useful

Beating the Averages

How to Do Philosophy

The Refragmentation

How to Make Pittsburgh a Startup Hub

The mirage of visual understanding in current frontier models当前前沿模型中视觉理解的幻象

The Roles of Packages包的角色

6o6 v1.1: Faster 6502-on-6502 virtualization for a C64/Apple II Apple-1 emulator6o6 v1.1：C64/Apple II 上 Apple-1 模拟器的更快 6502-on-6502 虚拟化

‘How Apple Became Apple: The Definitive Oral History of the Company’s Earliest Days’《苹果如何成为苹果：公司早期历史的权威口述史》

Netflix Wrecked Their tvOS Video PlayerNetflix 搞坏了他们的 tvOS 视频播放器

New York Post: ‘Trump Considers Renaming Strait of Hormuz’纽约邮报：'特朗普考虑重新命名霍尔木兹海峡'

OpenBenches hits 40kOpenBenches 突破 4 万

Quoting Matt Webb引用 Matt Webb

Reading List 03/28/26阅读清单 2026/03/28

Apple Announces Ads Are Coming to Apple Maps苹果宣布 Apple Maps 即将推出广告

Fork Commits via Original Repository通过原始仓库获取 Fork 提交

Quoting Richard Fontana引用 Richard Fontana

Vibe coding SwiftUI apps is a lot of fun用 vibe coding 写 SwiftUI 应用很有趣

Premium: How Much Of The AI Bubble Is Real?付费文章：AI 泡沫有多少是真实的？

An AI Odyssey, Part 3: Lost Needle in the HaystackAI 奥德赛，第三部分：大海捞针