Remember the Yellowstone disk controller card that I designed back in 2018? It was an FPGA-based clone of the Apple II Liron controller, with aspirations to eventually become a universal reconfigurable disk controller. And it worked nicely when it was the only card installed in the computer, but things went haywire when too many other expansion cards were also present. I eventually gave up and abandoned the project, but now I think I’ve fixed it.
The symptoms were documented in a series of blog posts here, here, here, and here. The more expansion cards present along with Yellowstone, the more likely I was to see errors such as unexplained resets and lockups and drops into the Apple II system monitor. Investigation with an oscilloscope showed lots of nasty looking signals, huge over and undershoot on the data bus, and strange transients on the power supply during card I/O.
There were plenty of theories to explain the problem, and I received over 100 comments from helpful readers. Some theories put forward were: poor grounding, insufficient bulk capacitance, a too weak 3.3V regulator, impedance mismatch, bus fighting, failure to meet the minimum input high voltage, a too strong or too weak drive from the bus driver IC, wrong FPGA slew rate, bad scope probes, bad power supply in the Apple II, a race condition in the logic, and more. My own best guess was a combination of grounding and impedance problems. I spent many weeks chasing various theories without much success. I hacked the card and replaced its bus driver with one from a different logic family. I even started to wonder whether the whole problem lay with the computer rather than the card. By March 2018 I gave up in frustration.
Two and a Half Years Later
This past week I’ve been investigating ideas for an Apple II video card, and a reader pointed me to this tech note about Apple IIgs expansion card design. The point was to learn which signals were provided to the different slots, but my eye caught a different paragraph titled “Avoiding Bus Fights”. As the text described,
“To avoid potential (or actual) bus fights, it is helpful to avoid driving read data from an expansion card onto the bus immediately after PH0 rises. … If a card drives data onto the expansion slot data bus immediately after PH0 rises, there may be a bus fight between the expansion card trying to drive the bus, and the Apple IIGS (or Apple IIe) bus buffers, which may not have turned around yet. … Developers can avoid bus fights by simply using 74LS or 74HCT series parts and relying upon typical delay stackups to delay driving the data bus for approximately 30 nanoseconds. A more solid technique is using the first rising edge of the 7M clock, after PH0 rises.”
My card responds when the Apple II asserts the I/O SELECT signal for its slot, which happens at the same time as PH0 rising. What this paragraph says is that the card should intentionally wait at least 30 ns before responding, because the motherboard’s 74LS245 bus driver is still driving the data bus even after PH0 and the assertion of I/O SELECT!
At first glance this seems ridiculous. Why would the Apple II assert I/O SELECT for a card before it’s safe for the card to output data? But if you assume the card is built with 1978 vintage ICs that can’t respond very quickly anyway, it wouldn’t have been a problem. The trouble only appears when you use an FPGA and modern logic families like 74LVC with propagation delays of just a few nanoseconds. It becomes necessary to add an artificial output delay to avoid bus fighting.
Several readers had suggested more or less exactly this, including Fluffysheap who described it perfectly in the comments here. But I must have been too frustrated or too tired back then, and I never fully followed through on checking this theory.
Bus fighting almost perfectly explains the horrible signals I observed on the scope. For a few tens of nanoseconds at the beginning of my card’s data output, it was fighting with the motherboard’s bus driver, creating eight short circuits on the 8-bit bus. This caused a surge of current, resulting in horrible power supply transients and wild swings on the bus. From my scope observations this period seemed to last about 70-80 ns, rather than the 30 ns mentioned in the tech note. But the tech note described the Apple IIgs, not the Apple IIe that I used for my tests. Maybe the Apple IIe bus driver is slower to shut off.
One thing that bus fighting doesn’t seem to explain is why adding more peripheral cards would make the problem worse. It appears my card was engaging in a bus fight with the motherboard’s own bus driver, and the other cards were just innocent bystanders. The only affect of their presence would be to increase the bus capacitance. I may still be missing something here.
Armed with this newfound knowledge, I went to edit the Yellowstone FPGA source to insert an intentional delay before enabling the card’s bus driver for output. Lo and behold, code for creating a delay was already there, but commented out. It was written by me. I can’t remember if I ever tested it back in 2018. Maybe I had the idea but never tried it, or maybe I tried it but something went wrong. Either way, I gave thanks to 2018-Steve and just reapplied the already-existing code.
At first there was some comedy, because I tried several different changes that appeared to have no effect. After half an hour, I realized I was rebuilding the FPGA configuration file after each change, but then programming the old configuration file from 2018. Oops.
What can I say, it works. I loaded up my Apple IIe with a sampler of six different expansion cards in different combinations, connected to a variety of Floppy Emus and real drives, including a Smartport-aware Unidisk 3.5 drive. Everything worked as expected, and there were no unexplained resets or other weird behavior.
I looked at the data bus and the power supplies on the scope, and everything appeared cleaner than before. The power supplies looked OK. There was still some overshoot on the databus when the card first started driving, but much less than before. Maybe this can be improved further by adding some small inline resistors on the next version of the card. I adjusted the output delay to about 120 ns, which is probably much longer than necessary, but it still leaves more than ample time for 2020-era logic chips to do their jobs.
Things look good with Yellowstone on the Apple IIe, but the IIgs is another story. I have to switch the IIgs to normal speed instead of fast, but then the card should work. A real Liron disk controller card works fine in a IIgs, as long as the system speed is changed, so there’s no fundamental incompatibility. Unfortunately the Yellowstone card just plain doesn’t work on the IIgs. At first I thought that was a result of my new output delay, but removing the delay didn’t help. Then I dug through all my notes from 2018 and concluded that the card never worked on the IIgs under any circumstances. So this isn’t a new problem – it’s an old problem I just hadn’t noticed because it was overshadowed by the bus fighting.
The sort-of-good news is that this failure of Yellowstone on the IIgs seems repeatable and debuggable. Instead of weird spurious resets and lockups like I was seeing two years ago, it looks like it’s just a communication error with the drive. The Yellowstone card firmware appears to be running OK, and I get reasonable error messages like “NO DEVICE CONNECTED”. With a Unidisk 3.5 drive attached, there’s no drive activity. With a Floppy Emu attached and configured for Smartport emulation, it reports a checksum error.
I suspect there’s another timing problem here, but this one relates to writes to the card instead of reads. Perhaps I’m not latching the data from the bus at the right time, and due to minor differences between the bus timing on the IIe and IIgs, it still works OK on the IIe but occasionally writes the wrong values to the IWM chip on the IIgs, or fails to write anything. That would cause garbled communication with the drive.
I’m calling a stop for the moment, feeling pleased with this new progress, and optimistic that I’ll eventually find an explanation for the IIgs behavior.