http://halicery.com/Hardware/ECP Parallel Port transfer/ECP cross-over.html

PC parallel port data transfer with home-made ECP cross-over cable

Keywords parallel port; spp; epp; ecp; dma; super io chip; bx440;

Intro

My serial transfer peaked at around 10 KB/s at 115200 only. I've been heard that the PC parallel port can transfer data much faster, especially in those EPP/ECP modes promising megabytes per sec. Does it? And is it possible to connect two PC-s through the parallel port somehow and transfer data at that rate? What is the maximum speed that the hardware can take? This page is about investigations around the parallel port with direct hardware programming on bare PC.

I've ended up in ECP mode, making cables, and with a steady transfer rate around 1 MB/s using ISA/LPC DMA.

What does the hw give?

This comes - of course - from the original IBM PC and was meant to connect a character printer, in one direction only, with all handshaking and error handling under software control. Called SPP (Standard PP) today.

For nibble, PS2, EEP and ECP parallel port modes all use the same number of pins but with some redefined, many things under hw control and additional support. Legacy modes (SPP, nibble, PS2) are way too slow, EPP is not a peer-to-peer protocol so these are useless for connecting 2 PC-s with some kind of parallel cable to transfer data fast back and forth.

ECP HW and protocols

1993. "ECP has been jointly developed by Microsoft and Hewlett-Packard with the hopes of making it a widely adopted standard. Hewlett-Packard and Microsoft recommend that the ECP modes be incorporated into BOISE, the proposed IEEE 1284 specs." See Extended Capabilities Port Protocol and ISA Interface Standard. July 14, 1993. In 1994 IEEE took over the issue and a draft spec is on the way: it became IEEE 1284-1994 and put together BOISE, EPP and ECP. HP and Microsof Corp. have hade major contribution to the spec.

ECP requires new hardware and extends the protocol part:

The ECP PROTOCOL: Interlocked handshake

The ECP chip transfers 8-bit data either in forward or in backward direction (DCR_5). It uses 2 control and 2 status lines of the parallel port: the handshake, timings and constraints are the same for both making the connection symmetric. It is possible to connect 2 PC parallel ports with an ECP cross-over cable and transfer data in either direction (of course direction must be agreed upon first and this is where IEEE 1284 comes into the picture, but it is beyond ECP).

The basic ECP protocol side-by-side, PD7-0 not drawn as it simply goes straight through:

        ECP FORWARD                                                 ECP BACKWARD
                                                   
        1. Waits for BUSY goes low..               
                                                                    2. Sets AF low: ready to receive
        3. Drives data byte onto PD7-0             
                                                                    4. Waits for ACK goes low..
        5. Sets STROBE low                         
                                                                    6. Sets BUSY high
        7. Waits for P sets BUSY..                 
                                                                    8. Reads data byte from PD7-0
        9. Sets STROBE high                                         
                                                                    goto 2.
        goto 1.


PIN                                   write data                                                     PIN
                                          |3                                    
     ______      ________________1________|_____5             ______1________________         ___
 1   STROBE --->                                \________7___/9                        ---->  ACK    10
                                                                                  
                 ____________________2                  _______ ________                      __
11   BUSY  <----                     \______4__________/6      |        \2___________  <----  AF     14
                                                               |8                           
                                                            read data                            

.

Interlocked handshake is to avoid buffer overruns. Sender/Receiver waits infinitely for signals to change so data is not just flowing in, opposed to the UART f.ex. Just a note: the UART has a few modem control/status lines and the connector is at least DB-9. I wish there was a hardware setting for an interlocked handshake using 2 extra lines... but there isn't and connecting 2 UART with different speed will eventually overrun without extra software support. Always.

So the great benefit of ECP is this complete syncrounous interlocked-handshake without danger for overrun. But the challenge then will be to monitor time-outs in software in case the other guy is turned off or not working. This is not implemented here.

ECP cross-over cable

Now, STROBE goes to ACK, BUSY to AF on the other side. By turning around the ECP cross-over cable becomes truly symmetric. This connects all the 12 signals used by the ECP hardware for bi-directional data transfer:

______                                         ______
STROBE   ----->   1  _____  _____   1   <----  STROBE    
___                       \/                   ___
ACK      <----   10  _____/\_____  10   ---->  ACK     
__                                             __
AF       ---->   14  _____  _____  14   <----  AF     
                          \/                          
BUSY     <----   11  _____/\_____  11   ---->  BUSY       

DATA0..7 <----                          ---->  DATA0..7

Haha, my home-made cable from an old SCART cable. SCART has 20+ wires so it was perfect:

[fig]

ECP FIFO

The ECP hw has a 16-byte FIFO. In the forward direction (output), when the FIFO is written through an I/O port the hw generates ECP write cycles on the wire - and empties the FIFO. Similarly in the backward direction, when the FIFO has free slots, the hw will generate ECP read cycles on the wire. FIFO is emptied by CPU or DMA I/O reads. There is no FIFO overrun thanks to the interlocked ECP handshake.

                                                                                     __   
                                                                                    | .\ 
                         +------------------------------+                           | . |
                         |                              |                           | . |  
  DMA      <-------->    |          ECP FIFO            |  <--------------------->  | . |
CPU PIO       R/W        |          16 bytes            |     read/write cycles     | . |
                         +------------------------------+                           | . | 
                                                                                    |_./   

ECP command/data byte

There is a 3rd control line driven by the ECP hardware standard in forward direction: AF#.

It signals a command or data byte and can be used for the built-in RLE decompression on the receiver side or for addressing different peripherials by the driver (f. ex. the scanner or the printer part of the box). RLE/channel is not used here in this test.

Writing into ECP AFIFO at location BASE drives AF low ("address/rle command"), while writing into the ECP DFIFO at location BASE+400h drives AF high (normal data byte). When AF# is low the ECP protocol defines the databyte as

+---+---------------------------+         +---+---------------------------+       __
| 0 |  Run-Length Count (0-127) |         | 1 |   Channel Address (0-127) |       AF = 0
+---+---------------------------+         +---+---------------------------+



The ECP FIFO is extended with the TAG bit and empties the same way as the data bytes.

                                                                __                   __  
                         +------------------------------+       AF high/low         | .\  
                         |             TAG              |  ---------------------->  | . |
           BASE or       +------------------------------+                           | . |
          BASE+0400h     |                              |                           | . |  
CPU PIO  ------------->  |          ECP FIFO            |  ---------------------->  | . |
            write        |                              |      write cycle          | . |
                         +------------------------------+                           |_./  

ECP DOES NOT define what happens with the AF# line on the receiver side, i.e. sensing and handling of command/data in backward direction for SIO chips with ECP parallel is chip-dependent. My W83977EF from WINBOND does not support it. My PC87309 from NATIONAL SEMICONDUCTOR fully supports RLE-expansion, samples command/data on BUSY and stores it as tag-bit in the FIFO along with the data byte. The tag can be read through an ECP extended register, making the PC a proper ECP peripheral.

 __     
| .\     BUSY high/low     +------------------------------+  
| . |  ----------------->  |             TAG              | ---->  read BASE + 0405h  Extended Auxiliary Status Register (EAR) 
| . |                      +------------------------------+  
| . |                      |                              |  
| . |  ----------------->  |          ECP FIFO            | ---->  read BASE + 0400h  DFIFO 
| . |    read cycle        |                              |  
|_./                       +------------------------------+  

Because of AF# and BUSY is already cross-connected in the cable we can also experiment with RLE and channels using the PC87309 as receiver.

ECP and ISA DMA

ECP by standard supports ISA DMA. This is how it was developed by MS and HP. Because I was interested in full hw supported speed, ISA DMA is the only way for full-speed ECP transfer.

The SIO chip is an ISA peripheral connecting to ISA bus DREQ/DACK, T/C and IRQn. The ECP hw can work in DMA mode: in forward direction it asserts DREQ to fill its FIFO and in backward direction asserts DREQ to empty its FIFO - in the same time it generates ECP-cycles on the wire. IRQ is used to signal the end of the DMA transfer to the CPU: first the DMA controller asserts T/C (end of transfer), then the ISA device asserts IRQn. All without CPU intervention:

               |        |                                                               __    
               |        |                                                              | .\   
+-------+      |  ISA   |         +-------------------------+                          | . |  
|       |      |        |         |  +---+---+---+---+---+  |  <-- PD7-0  ------->     | . |  
|  DMA  | <--> |  BUS   | <-----> |  |   |  FIFO |   |   |  |      STROBE ------->     | . |  
|  Cntl |      |        |   DREQ  |  +---+---+---+---+---+  |  <-- ACK                 | . |  
+-------+      |        |   DACK  |                         |      AF     ------->     | . |  
               |        |   R/W   |                         |  <-- BUSY                |_./   
+-------+      |        |   TC    |--+                      |                          DB-25
|       |      |        |         |  |                      |                         connector
|  INT  | <--- |        |<- IRQ   |<-+                      |
|  Cntl |      |        |         +-------------------------+                                     
+-------+      |        |                                       
               |        |                INT ENABLE
                                         DMA ENABLE

.
Since the '90s up to today parallel port function is in the SIO chip on the motherboard. It is connected to the South bridge with either ISA or now through the LPC interface and most of them support ECP. It's a long way from system memory to the wires on the PP: the I/O South Bridge (PIIX4) is responsible to collect these DMA cycles and transfer up onto the PCI-bus for the North Bridge, which will eventually generate DRAM-cycles. Example for the Intel 440BX chipset:

            +-----------+
            |           |
            |    CPU    |
            |           |
            +-----------+
                  |
                  |
                  |
       +----------------------+
       |                      |              +------------+
       |     Host Bridge      |              |            |
       |       82443BX        |--------------|    DRAM    |
       |                      |              |            |
       |                      |              |            |
       |    +------------+    |              +------------+
       |    |   PCI      |    |
       |    | controller |    |
       +----------------------+
                 |                            
                 |
              ---------------------------------------------------------------- PCI BUS 33 MHz
                 |
                 |
       +----------------------+
       |                      |
       |       PIIX4          |
       |                      |----- USB PORT1/2 (PCI)
       |  82C59 Interrupt     |
       |  82C54 Timer         |----- IDE1/2 (PCI)
       |  MC146818A RTC       |
       |  82C37 DMA           |
       |                      |
       +----------------------+
                 |                            
                 |
              ---------------------------------------------------------------- ISA BUS 33/4 MHz
                 |
                 |
       +----------------------+                                 __     
       |                      |----- SERIAL PORT               | .\    
       |                      |----- PARALLEL PORT ----------- | . |   
       |   SUPER I/O CHIP     |----- KEYBOARD                  | . |   
       |                      |----- MOUSE                     | . |   
       |                      |----- FLOPPY                    | . |   
       +----------------------+                                | . |   
                                                               |_./    
                                                               DB-25   
                                                              connector

.
The ECP standard also limits the number of continous DMA transfer cycles 32 with some idle between. Based on the PC87309 SIO chip docs, it may assert DREQ for 8-32 DMA cycles, then idle for a minimum of 8 ISA clock cycles:

        _______________________
_______|   max 32 DMA cycles   |________________|   DREQ from SIO
                                   min 8 CLK

But on these modern chipsets, like 440BX, we don't have to worry too much about occupying the ISA bus anyway: the CPU is not there anymore, like in the PC/AT, and the SIO chip is probably the only peripheral on the ISA bus. To achieve maximum transfer speed I set DREQ to max and idle for min. But how many ISA CLK is one DMA cycle? The ISA bus master is the South Bridge (PIIX4) and can run a DMA cycle in compatible mode (5/8 SYSCLK?), but has also a very nice feature, which according to my measurements improves speed: Fast-DMA. In Fast-DMA mode one DMA cycle takes only 3 SYSCLK.

Also note that because of this jagged DREQ from the SIO chip only DEMAND TYPE DMA can be programmed into the DMA CONTROLLER (block will not work).


Method

I fired up 2 ancient motherboards both based on the 440BX chipset making sure they support ECP before that (by checking what type of SIO chip is on the motherboard). They booted with PXE or floppy my little home-brew OS. One will send 1 MB over the home-made ECP cross-over cable and the other one is receiving it. Time is measured in ticks using Timer-0 set to 0x04A9, around 1 ms. Output is through the serial port, both connects to the dev PC and putty.

Around 1 MB/sec was measured from one PC's DRAM to the other using ISA DMA on both and the 440BX chipset switched to F-type DMA (3 ISA CLK). Nothing else is running, only one interrupt handler for the end of DMA transfer. The CPU is in HALT between interrupts. I've tested several mobos and the speed really varied, the worst was around 750 KB/s (an LPC SIO chip in an ASUS L4000L). Only managed to go above 1 MB/s between 2 mobos (ASUS P3B-F and DTK ) both having the W83977EF SIO chip, reaching 1.1 MB/s.

ASUS L4000L laptop (SIO: LPC PC8739x family SID=EAh, SiS961 South Bridge, no tweeking)

2 x Dell GX1 mobo (SIO: ISA PC87309, 82371EB South Bridge (PIIX4E): F-DMA ON)

ASUS P3B-F mobo (SIO: ISA W83977EF SID=52h, 82371EB South Bridge (PIIX4E): F-DMA ON)

DTK PRM-0080I-E1 mobo (SIO: ISA W83977EF SID=52h, 82371EB South Bridge (PIIX4E): F-DMA ON)

Results

                 |                         RECEIVER: 
                 |  L4000L            GX1              ASUS P3B-F          DTK PRM-0080I-E1
-----------------+---------------------------------------------------------------------------------
SENDER:          | 
                 | 
                 |                  726 KB/s 
L4000L           |                 1410 ms   
                 | 
                 |                  817 KB/s                                    817 KB/s
GX1              |                 1253 ms                                     1253 ms
                 |                                                                         
                 |                 856 KB/s                                    1.06 MB/s
ASUS P3B-F       |                 1195 ms                                      938 ms 
                 |                                                  
                 |                 1.03 MB/s
DTK PRM-0080I-E1 |                  994 ms  
                 | 

Conclusion

1. ASUS/DTK with ISA W83977EF is the winner!
2. Although same chipset: GX1 cannot send as fast as ASUS/DTK. But can receive almost as fast (why?).
3. Laptop with a (newer) LPC PC8739x but with SiS chipset is slow..

Some notes

Interestingly using PIO mode the maximum transfer rate was only around 462 KB/s - but that occupies the CPU.

Compatible vs. Fast-DMA on PIIX4E

Below is just out of curiousity the Saleae analyzer is connected to IOW# on the ISA bus with PIIX4E F-DMA on. The promised 4-byte Fast-DMA transfer cycle is clearly visible:



I also measured and compared the raw transfer speed for 1MB, which showed some (significant?) improvement:

Sender: ASUS P3B-F
Receiver: Dell GX1

Output from Sender (millisec):

->1079       <- here receiver F-DMA ON:  949 KB/sec
->1079
->1079
->1079
->1079
->1079
->1079
->1079
->1079
->1079
->1079
->28690      <- receiver rebooted.....
->1195
->1196
->1195       <- here receiver F-DMA OFF: 856 KB/sec
->1197
->1195
->1196
->1195
->1195
->1196
->1195
->1196
->1195
->1196
->1195
->1196
->1195
->1195
->1196
->1195
->1196
->1195

.


Sun Dec 23 18:52:51 UTC+0100 2018 © A. Tarpai