2 0 0 PCI32 4000 XLA Master Interfaces Version 3.0 March, 1999 Data Sheet R LogiCORETM Facts Core Specifics Xilinx Inc. 2100 Logic Drive San Jose, CA 95124 Phone: +1 408-559-7778 Fax: +1 408-377-3259 E-mail: Techsupport:hotline@xilinx.com Feedback: logicore@xilinx.com URL: http://www.xilinx.com/pci Introduction With Xilinx LogiCORE PCI32 4000 XLA Master Interfaces Version 3.0, a designer can build a customized, 32-bit, 33 MHz fully PCI compliant system with the highest possible sustained performance, 132 Mbytes/sec, and up to 124,000 system gates in a XC4000XLA FPGA. Features * * * * * * * * * Fully 2.2 PCI compliant 32-bit, 33 MHz PCI Interface - Master (Initiator/Target) Programmable single-chip solution with customizable back-end functionality Pre-defined implementation for predictable timing in Xilinx XC4000XLA FPGAs Incorporates Xilinx Smart-IP Technology 5 V and 3.3 V operation with XC4000XLA devices Zero wait-state burst operation Fully verified design - Tested on Xilinx internal testbench and in hardware (proven in FPGAs and HardWire devices) Configurable on-chip dual-port FIFOs can be added for maximum burst speed (see Xilinx Documents section) Supported Initiator functions - Memory Read, Memory Write, Memory Read Multiple (MRM), Memory Read Line (MRL) commands - I/O Read, I/O Write commands - Configuration Read, Configuration Write commands - Special Cycles, Interrupt Acknowledge - Bus Parking - Basic Host Bridging March, 1999 Device Family CLBs Used IOBs Used1 System Clock fmax Device Features Used XC4000XLA 178 - 308 53/51 0 - 33MHz Bi-directional data buses SelectRAMTM (optional user FIFO) Boundary scan (optional) Supported Devices/Resources Remaining I/O CLB1 XC4013XLA PQ208 99/101 268 - 398 XC4013XLA PQ240 133/135 268 - 398 XC4028XLA HQ240 133/135 716 - 846 XC4062XLA HQ240 133/135 1996 - 2126 XC4062XLA BG432 293/295 1996 - 2126 Provided with Core Documentation PCI Design Guide Implementation Guide Conversion Guide Design File Formats VHDL, Verilog Simulation Model NGO Netlist2 Constraint Files UCF and Guide files Verification Tools VHDL and Verilog Testbench Core Symbols VHDL, Verilog Reference designs & Ping Reference Design application notes Synthesizable PCI Bridge Design Additional Items PCI System Architecture (reference book) Design Tool Requirements Xilinx Core Tools M1.5i Verification Tools3 VHDL, Verilog, Support Xilinx provides technical support for all LogiCORE products when used as described in product documentation. Xilinx cannot guarantee timing, functionality, or support if implemented in unspecified devices or customized beyond that referenced in product documentation, or if changes are made to "DO NOT MODIFY" sections of the design. Notes: 1. The exact number of CLBs depends on user configuration of the core and level of resource sharing with adjacent logic. Design size depends on the number and size of the BARs, and use of the latency timer. 2. Available on Xilinx PCI web site: www.xilinx.com/pci 3. Visit Xilinx web site for supported EDA tools 1 PCI32 4000 XLA Master Interfaces Version 3.0 Features (cont.) Applications * * PAR PERRSERR- Parity Generator/ Checker * * * PCI add-in boards such as graphic cards, video adapters, LAN adapters and data acquisition boards Embedded applications within networking, telecommunication, and industrial systems CompactPCI boards Other applications that need PCI General Description The LogiCORETM PCI32 4000 XLA Master Interfaces V3.0 are pre-implemented and fully tested modules for Xilinx XC4000XLA FPGAs (see LogiCORE Facts for list of supported devices). The pin-out and the relative placement of internal Configurable Logic Blocks (CLBs) are pre-defined. Critical paths are controlled by TimeSpecs and guide files to ensure that timing is always met. This significantly reduces engineering time required to implement the PCI portion of your design. Resources can instead be focused on unique back-end logic in FPGA and on system level design. Consequently, LogiCORETM PCI products can minimize development time. Xilinx XC4000XLA Series FPGAs enables design of fully PCI compliant systems. The devices meet all required electrical and timing parameters for 3.3V and 5V including AC output drive characteristics, input capacitance specifications (10pF), 7 ns setup and 0 ns hold to system clock, and 11 ns system clock to output. Base Address Register 0 Base Address Register 1 Command/ Status Register Interrupt Pin and Line Register Latency Timer Register Vendor ID, Rev ID, Other User Data PCI I/O INTERFACE AD[31:0] ADIO[31:0] FRAMEIRDYREQGNT- Initiator State Machine USER APPLICATION * Supported Target functions (PCI Master and Slave) - Type 0 Configuration Space Header - Up to 3 Base Address Registers (memory or I/O with adjustable block size from 16 bytes to 2 GBytes, slow or medium decode speed) - Parity Generation (PAR), Parity Error Detection (PERR# and SERR#) - Extended Capabilities Registers (backend module) - Memory Read, Memory Write, Memory Read Multiple (MRM), Memory Real Line (MRL), Memory Write, Invalidate (MWI) commands - I/O Read, I/O Write commands - Configuration Read and Configuration Write commands - Interrupt Acknowledge - 32-bit data transfers, burst transfers with linear address ordering - Target Abort, Target Retry, Target Disconnect - Full Command/Status Register Available for configuration and download on the Web - Web-based configuration with intuitive GUI - Generation of proven design files - Instant access to new releases PCI Configuration Space TRDYDEVSELSTOP- Target State Machine X7954 Figure 1: LogiCORETM PCI32 4000 XLA Interface Block Diagram (BAR 2 not shown) 2 March, 1999 The PCI Compliance Checklist (See the Xilinx PCI Databook) has additional details about electrical compliance. Other features that enable efficient implementation of a complete PCI system in the XC4000XLA include: * * * * * Select-RAMTM memory: on-chip ultra-fast RAM with synchronous write option and dual-port RAM option used in PCI Interfaces to implement the FIFO. Individual output enable for each I/O Internal 3-state bus capability 8 global low-skew clock or signal distribution networks IEEE 1149.1-compatible boundary scan logic support Target State Machine This block manages control over the PCI interface for Target functions. The states implemented are a subset of equations defined in "Appendix B" of the PCI Local Bus Specification. The controller is a high-performance state machine using state-per-bit (one-hot) encoding for maximum performance. State-per-bit encoding has narrower and shallower next-state logic functions that closely match the Xilinx FPGA architecture. Initiator State Machine The module is carefully optimized for best possible sustained performance and utilization in the XC4000XLA FPGA architecture. When implemented in a XC4013, more than 50% of the FPGA's resources remain for integrating a unique back-end interface and other system functions into a fully programmable one-chip solution. When implemented in a XC4062, 90% of the FPGA's resources remain. This block manages control over the PCI interface for Initiator functions. The states implemented are a subset of equations defined in "Appendix B" of the PCI Local Bus Specification. The Initiator Control Logic also uses stateper-bit encoding for maximum performance. Smart-IP Technology This block provides the first 64 Bytes of Type 0, V 2.2, Configuration Space Header (CSH) (see Table 1) to support software-driven "Plug-and Play" initialization and configuration. This includes Command, Status, and three Base Address Registers (BARs). These BARs illustrate how to implement memory- or I/O-mapped address spaces. Each BAR sets base address for the interface and allows system software to determine addressable range required by the interface. Using a combination of Configurable Logic Block (CLB) flip-flops for the read/write registers and CLB look-up tables for the read-only registers results in optimized packing density and layout. Drawing on the architectural advantages of Xilinx FPGAs, new Xilinx Smart-IP technology ensures highest sustained performance, predictability, repeatability, and flexibility in PCI designs. The Smart-IP technology is incorporated in every LogiCORE PCI Core. Xilinx Smart-IP technology leverages the Xilinx architectural advantages, such as look-up tables (LUTs), distributed RAM, segmented routing, logic mapping and relative location constraints. This technology provides the best physical layout, predictability, performance, and significantly reduced compile times over competing architectures. The PCI32 4000 XLA Interface can be parameterized, which enables design flexibility and customization. PCI Cores based on Smart-IP technology are unique in that they maintain performance and predictability irrespective of device size. Functional Description The LogiCORE PCI32 4000 XLA Interfaces are partitioned into five major blocks, and the user application as shown in Figure 1. Each block is described below. PCI I/O Interface Block The I/O interface block handles physical connection to the PCI bus including all signaling, input and output synchronization, output three-state controls, and all request-grant handshaking for bus mastering. Parity Generator/Checker PCI Configuration Space With this release, the hooks for extending configuration space has been built into the backend interface. These hooks, including the ability to implement a CapPtr in configuration space, allows the user to implement functions such as Advanced Configuration and Power Interface (ACPI) in the backend design. User Application with Optional Burst FIFOs The LogiCORE PCI32 4000 XLA Interface is a general-purpose interface with a 32-bit data path and latched address for de-multiplexing the PCI address/data bus. The generalpurpose user interface allows the rest of the device to be used in a wide range of applications. Typically, the user application contains burst FIFOs to increase PCI system performance (An Application Note is available, please see the Xilinx Documents section). An onchip read/write FIFO, built from the on-chip synchronous dual-port RAM (SelectRAMTM) available in XC4000XLA devices, supports data transfers in excess of 33 MHz. Generates/checks even parity across the AD bus, the CBE lines, and the PAR signal. Reports data parity errors via PERR- and address parity errors via SERR-. March, 1999 3 PCI32 4000 XLA Master Interfaces Version 3.0 Table 2: PCI Bus Commands Table 1: PCI Configuration Space Header 31 16 15 0 Device ID Vendor ID 00h Status Command 04h Class Code BIST Header Type Rev ID Latency Timer 08h Cache 0Ch Line Size Base Address Register 0 (BAR0) 10h Base Address Register 1 (BAR1) 14h Base Address Register 2 (BAR2) 18h Base Address Register 3 (BAR3) 1Ch Base Address Register 4 (BAR5) 20h Base Address Register 5 (BAR5) 24h Cardbus CIS Pointer 28h Subsystem Vendor ID 2Ch Subsystem ID Expansion ROM Base Address Reserved CapPtr Reserved Max_Lat Min_Gnt Interrupt Pin Reserved Interrupt Line 30h 34h * * 4 Interrupt Acknowledge Special Cycle I/O Read I/O Write Reserved Reserved Memory Read Memory Write Reserved Reserved Configuration Read Configuration Write Memory Read Multiple Dual Address Cycle Memory Read Line Memory Write Invalidate PCI Slave Yes Ignore Yes Yes Ignore Ignore Yes Yes Ignore Ignore Yes Yes Yes Ignore Yes Yes Note: 1. The Initiator can present these commands, however, they either require additional user-application logic to support them or have not been thoroughly tested. Supported PCI Commands Table 2 illustrates the PCI bus commands supported by the LogiCORETM PCI32 4000 XLA Interfaces. The PCI Compliance Checklist, available in the Xilinx PCI data book, has more details on supported and unsupported commands. 40h-FFh The LogiCORE PCI32 4000 XLA Interfaces can easily be configured to fit unique system requirements using Xilinx web-based graphical configuration tool or changing the VHDL or Verilog configuration file. The following customization is supported by the LogiCORE product and described in accompanying documentation. * * 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 PCI Master Yes Yes Yes Yes Ignore Ignore Yes Yes Ignore Ignore Yes Yes Yes No1 Yes No1 3Ch Interface Configuration * Command 38h Note: Italicized address areas are not implemented in LogiCORE PCI32 4000 XLA Interface default configuration. These locations return zero during configuration read accesses. * CBE [3:0] Initiator or target functionality (The core can be used as a target-only Interface) Base Address Register configuration (1 - 3 Registers, size and mode) Configuration Space Header ROM Initiator and target state machine (e.g., termination conditions, transaction types and request/transaction arbitration) Burst functionality User Application including FIFO (back-end design) Burst Transfer The PCI bus derives its performance from its ability to support burst transfers. The performance of any PCI application depends largely on the size of the burst transfer. A FIFO to support PCI burst transfer can efficiently be implemented using the XC4000XLA on-chip RAM feature, SelectRAMTM. Each XC4000XLA CLB supports two 16x1 RAM blocks. This corresponds to 32 bits of single-ported RAM or 16 bits of dual-ported RAM, with simultaneous read/write capability. Bandwidth The Xilinx PCI32 4000 XLA Interfaces support a sustained bandwidth of up to 132 MBytes/sec. The design can be configured to take advantage of the ability of the LogiCORE PCI32 Interface to do very long bursts. Since the FIFO isn't a fixed size, burst can go on as long as the chipset arbiter will allow. Furthermore, since the FIFOs and DMA are decoupled from the proven core, a designer can modify these functions without effecting the critical PCI timing. March, 1999 The flexible Xilinx backend, combined with support for many different PCI features, gives users a solution that lends itself to being used in many high-performance applications. Xilinx is able to support different depths of FIFOs as well as dual port FIFOs, synchronous or asynchronous FIFOs and multiple FIFOs. The user is not locked into one DMA engine, hence, a DMA that fits a specific application can be designed. The theoretical maximum bandwidth of a 32 bit, 33 MHz PCI bus is 132 MB/s. How close you get to this maximum will depend on several factors, including the PCI design used, PCI chipset, the processor's ability to keep up with your data stream, the maximum capability of your PCI design and other traffic on the PCI bus. Older chipsets and processors will tend to allow less bandwidth than newer ones. In this version of the Interface, all devices are zero wait state except for the XC4062XLA HQ240, which is a one wait state design. The XC4013XLA-09, XC4028XLA-09 and XC4062XLA-09 support zero wait-state burst, equal to a sustained bandwidth of up to 132 MBytes/sec. Only the XC4062XLA HQ240 requires one wait-state while sourcing data. See Table 3 for a PCI bus transfer rates for various operations in either zero or one wait-state mode. Table 3: LogiCORE PCI32 4000 XLA Transfer Rates Zero Wait-State Mode Operation Transfer Rate Initiator Write (PCI LogiCORE) 3-1-1-1 Initiator Read (PCI LogiCORE) 4-1-1-1 Target Write (PCI LogiCORE) 5-1-1-1 Target Read (PCI LogiCORE) 6-1-1-1 One Wait-State Mode Operation Transfer Rate Initiator Write (PCI LogiCORE) 3-2-2-2 Initiator Read (PCI LogiCORE) 4-1-1-1 Target Write (PCI LogiCORE) 5-1-1-1 Target Read (PCI LogiCORE) 6-2-2-2 Note: Initiator Read and Target Write operations have effectively the same bandwidth for burst transfer. In the Zero wait-state mode, no wait-states are inserted either while sourcing data or receiving data. This allows a 100% burst transfer rate in both directions with full PCI compliance. No additional wait-states are inserted in response to a wait-state from another agent on the bus. Either IRDY or TRDY is kept asserted until the current data phase ends, as required by the V2.2 PCI Specification. In one wait-state mode, the LogiCORE PCI32 4000 XLA Interface automatically inserts a wait-state when sourcing data (Initiator Write, Target Read) during a burst transfer. In this mode, the LogiCORE PCI32 4000 XLA Interface can accept data at 100% burst transfer rate and supply data at 50%. March, 1999 Timing Specification The XC4000XLA family, together with the LogiCORE PCI32 products enables design of fully compliant PCI systems. Backend design can affect the maximum speed your design is capable of. Factors in your back-end designs that can affect timing include loading of hot signals coming directly from the PCI bus, and gate count. Table 4 shows the key timing parameters for the LogiCORE PCI32 Interfaces that must be met for full PCI compliance. Table 4: Timing Parameters [ns] Parameter CLK Cycle Time CLK High Time CLK Low Time CLK to Bus Signals Valid3 CLK to REQ# and GNT# Valid3 Tri-state to Active CLK to Tri-state Bus Signal Setup to CLK (IOB) Bus Signal Setup to CLK (CLB) GNT# Setup to CLK GNT# Setup to CLK (CLB) Input Hold Time After CLK (IOB) Input Hold Time After CLK (CLB) RST# to Tri-state Ref. LogiCORE PCI32 4000 XLA XC4000XLA-1 Max Min Max 301 11 11 11 22 8.5 PCI Spec. TICKOF Min 30 11 11 2 TICKOF 2 12 22 11 22 2 28 7 281 7 7 71 TPSD 10 7 TPSD 10 10 TPHD 0 0 0 02 40 402 TPSD Notes: 1. Controlled by TIMESPECS, included in product 2. Verified by analysis and bench-testing 3. IOB configured for Fast slew rate Verification Methods Xilinx has developed a testbench with numerous vectors to test the Xilinx PCI design; this is included with the LogiCORE PCI32 4000 XLA Master Interfaces A version of this testbench is also used internally by the Xilinx PCI team to verify the PCI32 Interfaces. Additionally, the PCI32 Interfaces have been tested in hardware for electrical, functional and timing compliance. 5 PCI32 4000 XLA Master Interfaces Version 3.0 The testbench shipped with the interface verifies the PCI interface functions according to the test scenarios specified in the PCI Local Bus Specification, V2.1; see Figure 2. This testbench consists of 28 test scenarios, each designed to test a specific PCI bus operation. Refer to the checklists chapter in this databook for a complete list of scenarios. Device Utilization Utilization can vary widely, depending on the configuration choices made by the designer. Options that can affect the size of the core are: * Figure 2: PCI Protocol Testbench faketarg pci_lc_i testbnch Target Functional Mode LogiCORE PCI Interface Initiator Protocol Test User Application pcim_tst * * Number of Base Address Registers Used. Turning off any unused BARs will save on resources. The core now includes a switch to force the entire deletion of unused Base Address Registers. Size of the BARs. Setting the BAR to a smaller size requires more flip-flops. A smaller address space requires more flip-flops to decode. Latency timer. Disabling the latency timer will save a few resources. It must be enabled for bursting. Recommended Design Experience Simple Arbiter fakearb X7951 Ping Reference Design The Xilinx LogiCORE PCI "PING" Application Example, delivered in VHDL and Verilog, has been developed to provide an easy-to-understand example which demonstrates many of the principles and techniques required to successfully use a LogiCORE PCI32 4000 XLA Interface in a System On A Chip solution. The LogiCORE PCI32 4000 XLA Interfaces are pre-implemented allowing engineering focus at the unique back-end functions of a PCI design. Regardless, PCI is a high-performance system that is challenging to implement in any technology, ASIC or FPGA. Therefore, we recommend previous experience with building high-performance, pipelined FPGA designs using Xilinx implementation software, TIMESPECs, and guide files. The challenge to implement a complete PCI design including back-end functions varies depending on configuration and functionality of your application. Contact your local Xilinx representative for a closer review and estimation for your specific requirements. Synthesizable PCI Bridge Design Example Synthesizable PCI bridge design examples, delivered in Verilog and VHDL, are available to demonstrate how to interface to the LogiCORE PCI32 4000 XLA V3.0 Interfaces and provides a modular foundation upon which to base other designs. See separate data sheet for details. 6 March, 1999