When Two EPROM Programmers Disagree: A Cross-Validation Workflow

When Two EPROM Programmers Disagree: A Cross-Validation Workflow

Part 1 of 2. Part 2 covers the followup bench experiments to identify exactly why the GQ-4x4 was misreading — published soon.

I'd recently picked up a fresh batch of M27C512 EPROMs and wanted to see how well they programmed before adding the lot to working stock. I was also overdue to update my personal diagnostic cartridge, so I grabbed a tube of the new chips, burned my updated diagnostic ROM onto twelve of them, and split the burn — six on the GQ-4x4 and six on the EMP-20, the two programmers I trust.

The validation plan was simple: read every chip back on the other programmer, compare to the source file, confirm the lot was good. Belt and suspenders.

Eight chips passed cleanly. The last four came back from the GQ-4x4 with verify errors. The EMP-20, hours earlier, had read those same chips and said they were perfect. One of the programmers was lying. They were both confident.

This is the writeup of how I figured out which one was lying, and the workflow I now use to keep both honest. If you do EPROMs for chips you actually ship — replacement BIOSes, arcade ROM swaps, any community PCB work— there's a methodology here you can lift directly.

The cast

A quick rundown of the bench gear, because the contrast turned out to matter:

Needham's EMP-20. Released in the late 1990s, runs DOS software off a parallel port, has a separate "family module" PCB you swap depending on what you're burning. Its device list still calls ST Microelectronics "SGS_Thomson" because it predates the 1998 rebrand. Slow, ugly, and built like a tank.

GQ-4x4. Modern USB programmer from MCUmall. Compact, fast, supports thousands of parts, runs on any current Windows machine. The kind of unit you'd buy today if you were starting from scratch.

The chip. STMicroelectronics M27C512-70XF1. 64KB CMOS UV-erasable EPROM, 70ns access time, classic ceramic DIP with the little quartz window. The workhorse of building retro carts.

The file. A 65,536-byte Atari 2600 diagnostic ROM, checksum 2B66 / D49A. The same family of tool I used to dial in the Atari 2600 bundle I picked up last weekend.

The plan: split the batch — six chips burned on the EMP-20, six burned on the GQ-4x4 — then cross-verify everything on the opposite programmer. If both units are healthy, every chip should pass both ways.

EMP-20 programmer connected to a Toshiba laptop running its DOS interface
"The bench for this batch. EMP-20 on the left, the DOS host laptop on the right with the Main Menu up. Oscilloscope on the shelf wasn't called in for this one."

The baseline workflow

Before I get to the part where things went sideways, here's the burn workflow I use on every chip. On the EMP-20 it's four keystrokes:

Key Operation What it actually proves
2 Verify device is erased The chip is blank. Sounds obvious, but partially-erased EPROMs read as 0xFF cold and then fail mid-burn. The single most common cause of "why won't this chip take?" Worth five seconds.
1 Program with Quick Pulse Burns each byte at Vcc = 6.25V (the programming voltage — yes, this is normal) and inline-verifies as it goes.
3 Verify device equals buffer Reads the chip back at Vcc = 5.00V — the operating voltage. Critical, because cells that programmed weakly can read fine at 6.25V and fail at 5.00V.
N Device checksum 16-bit sum of every byte on the chip. Should match the buffer checksum displayed on the screen. If yes, you're done.

That's 2 → 1 → 3 → N. For any chip going into hardware I can't easily revisit, I add one more pass: eject the chip, reinsert it, and run 3 → N again. Catches socket gremlins, marginal cells, bent pins. Thirty seconds well spent.

So far so good. All twelve chips burned cleanly on their respective programmers, checksums matched both ways. Time for the cross-verify pass.

All twelve eventually verified clean against the EMP-20 — but the cross-pass on the GQ-4x4 revealed something else entirely.

EMP-20 DOS status screen showing M27C512 settings, checksums, and programming voltages

"Buffer checksum 2B66 / D49A — matches the source. Pgm Vpp 12.75V, Pgm Vcc 6.25V, Quick Pulse algorithm. Belt before suspenders."

The disagreement

Eight of the twelve chips passed the GQ-4x4's verify cleanly. Four came back with errors. Same kind of error each time:

Verify Failed. Address=0x000002, Device=0xA2, Buffer=0xA9

GQ-4x4 verify failed dialog: Address 0x02, Device A2, Buffer A9

"Verify Failed, Address=0x000002, Device=0xA2, Buffer=0xA9. Four chips, same error, four times in a row."

The GQ-4x4 saying the byte at offset 2 is 0xA2 when it should be 0xA9. The EMP-20, on the same chips, said the byte at offset 2 was 0xA9.

Worth pausing on the asymmetry, because it sharpens the diagnosis: six of the twelve chips were programmed BY the GQ-4x4 in the first place — and those same six chips verify bit-perfect when read back on the EMP-20. So the GQ can write correctly. It's the read side that's intermittently lying.

That's unusual. Programming an EPROM is the harder operation: 12.75V Vpp, sustained programming pulses, careful signal hold timing. Reading is the easy operation: drive the address bus, sample the data lines. A unit that handles the hard thing fine but stumbles on the easy one is telling us the fault is specifically in the read path, not in the unit overall.

The 4-of-12 failure rate is its own clue. A hard hardware fault (stuck pin, dead component) would fail every chip every time. A software profile bug would fail at the same offsets, same wrong values, deterministically. Random intermittent failure — fine most of the time, occasionally wrong — is the signature of a marginal analog issue: timing right at the edge, or a power rail sagging during read sampling, or a sense amplifier operating near its threshold.

So before I even look at the bit pattern of the wrong byte, the shape of the failure has already narrowed the field. The chip is probably fine. The GQ-4x4's read path is probably analog-marginal. Now to confirm it.

Bit-flip analysis: step one in any EPROM failure

When a verify fails, the direction of the bit errors usually tells you the cause. EPROMs erase to all 1s; programming flips bits from 1 to 0. So:

  • Expected 0, got 1 = a bit that was programmed now reads as erased. Classic incomplete programming. The cell didn't hold full charge. Re-pulse usually fixes it.
  • Expected 1, got 0 = a bit that was erased now reads as programmed. Weird. Either contamination or you're reading the wrong cell.
  • Mixed direction in the same byte = the chip probably contains the correct data, but the programmer is reading the wrong cells, or reading them incorrectly.

XOR the bytes to see which bits differ:

Expected: 0xA9 = 1010 1001
Got:      0xA2 = 1010 0010
XOR:      0000 1011  ← bits 0, 1, 3 differ
Bit Expected Got Direction
3 1 0 1→0
1 0 1 0→1
0 1 0 1→0

Two flips one way, one the other. Mixed direction. The chip likely contains correct data — somebody is reading it wrong.

(Worth admitting: at first glance I told myself this looked like an incomplete-programming pattern. Then I did the actual math and the truth shook out. The lesson: do the bit-by-bit XOR before you assign a cause. Pattern-matching from memory will mislead you.)

When one-programmer verification isn't enough

The bit-flip analysis tells me the chip is probably fine and one of the programmers is misreading it. But which one? I need an independent source of truth that doesn't rely on either programmer's verify function.

This is where the chain-of-trust problem becomes obvious. The EMP-20's verify only proves that the chip matches the EMP-20's buffer. If the buffer got corrupted on load (wrong file type selected, cosmic ray, bad RAM), every chip in the batch would be a perfect copy of the wrong data and the programmer would happily verify all of them.

To break the loop, the chip's contents have to be compared against the original file on disk, using a tool that has nothing to do with the programmer that wrote it.

The audit-trail extension

For each chip I burn for a batch I'm shipping, I now add three steps after 2 → 1 → 3 → N:

A. Read device into buffer. On the EMP-20 this is Option 4. Pulls the chip's contents back into programmer memory.

B. Save buffer to disk. Option 9. I name them systematically: chip01.bin, chip02.bin, and so on.

C. Filesystem-level diff against the source. Drop to a DOS prompt:

fc /b chip01.bin source.bin

fc /b is the ancient DOS binary-compare utility. Reads two files, byte by byte, reports any differences. Expected output:

Comparing D:\EMP20\CHIP01.BIN and D:\EMP20\SOURCE.BIN
No differences

Three independent confirmations per chip: internal verify, checksum match, and now filesystem-level diff against the source using a tool with no idea what a programmer even is.

For the full audit trail across a batch, I also cross-compare chips against each other:

D:\EMP20>fc /b emp1.bin emp2.bin
No differences

D:\EMP20>fc /b emp1.bin source.bin
No differences

D:\EMP20>fc /b emp1.bin emp3.bin
No differences

D:\EMP20>fc /b emp1.bin emp4.bin
No differences

D:\EMP20>fc /b emp2.bin source.bin
No differences

D:\EMP20>fc /b emp3.bin source.bin
No differences

D:\EMP20>fc /b emp4.bin source.bin
No differences
DOS terminal on a Toshiba laptop showing fc /b binary compare results, all No differences

"fc /b on the Toshiba. Every chip readback identical to emp1.bin, and emp1.bin identical to crtdiag.bin — the source. The audit trail in plain text."

Twelve chips. All identical to each other on EMP-20 readback. All identical to the source file. The EMP-20 was telling the truth — about every chip in the batch, including the six the GQ-4x4 itself had programmed.

That's the whole audit-trail argument in one screenful. If anyone ever questions whether a cart in this batch is correct, that's the evidence.

The verdict — Part 1

Every chip is fine. The EMP-20 is fine. This particular GQ-4x4 can program correctly; it just can't read consistently — to be clear, that's the unit on my bench, not an indictment of the model — and the 4-of-12 failure rate tells us the issue is intermittent and analog, not deterministic and digital.

But "intermittent and analog" isn't an answer. How is it broken? A read-timing margin issue? A USB power problem? A sense-amplifier near its threshold? A firmware bug in the read routine specifically? Each one fits the fingerprint differently, and they need different fixes.

The shape of the lie tells you the cause. And to see the shape, you need to look at all of the GQ-4x4's misreads at once, not just the first one.

In Part 2, I take the misreading chip back to the GQ-4x4, dump it to a file, drop the file into HxD on a modern Windows machine, and do a visual side-by-side compare against the source. The error pattern that comes back is diagnostic — different failure modes produce different fingerprints, and this one has the fingerprint of a very specific kind of electrical problem.

Then I take the GQ-4x4 to the bench and run the experiments that confirm (or refute) the diagnosis. Three sequential reads of the same chip. USB cable variations. Powered hub. Firmware check.

The verdict, the experiments, and the workflow update — coming in Part 2.

What you can take away from Part 1

If you burn EPROMs for chips you ship, even a few at a time:

  1. Single-programmer verify isn't enough. A programmer that wrote a chip can only confirm the chip matches its own buffer. The buffer can be wrong.
  2. fc /b chipNN.bin source.bin at a DOS prompt against your original file is the cheapest possible audit trail — independent of the programmer, costs you thirty seconds per chip.
  3. Bit-flip direction analysis before you guess at causes. XOR expected against actual, look at which bits flipped in which direction. Pattern-matching from memory will lie to you; the math won't.
  4. When two programmers disagree, the disagreement itself is data. The "broken" one is telling you something specific about its own failure mode. The shape of the lie is the diagnosis.

The total overhead of the audit-trail workflow is about two minutes per chip beyond the bare-minimum verify. For a batch of twelve chips, that's twenty-four extra minutes to know — with byte-for-byte certainty — that every chip in the batch is bit-perfect.

For chips going into hardware I can't easily revisit, that's an obvious yes.


Part 2 picks up with the GQ-4x4 in pieces — what HxD revealed about the error pattern, what the bench experiments confirmed, and how the workflow changed afterward.

I'm Jeffrey Mays. Bench Notes is where I write up the actual workshop work — burns, builds, repairs, occasional drama. The printable workshop reference card that came out of this batch will be available alongside Part 2. Come back for it.

0 comments

Leave a comment

Please note, comments need to be approved before they are published.