This post starts with a brief reference to how iterrupts are implemented on PCI platforms.
There are three methods to implement interrupts on PCI platforms:
1) legacy interrupts
The devices attached to PCI bus are equipped with an external interrupt pin which is connected to a dedicated interrupt line on the bus. PCI bus has a limited number of such out-of-band lines. More specifically, there are four dedicated interrupt lines, named INTA, INTB, INTC and INTD.
To distribute the bus interrupt lines evenly accross devices and reduce sharing, device interrupt pins and bus interrupt lines are multiplexed in the following way:
1st Device 2nd Device 3rd Device PIN A - LINE A PIN A - LINE B PIN A - LINE C PIN B - LINE B PIN B - LINE C PIN B - LINE D PIN C - LINE C PIN C - LINE D PIN C - LINE A PIN D - LINE D PIN D - LINE A PIN D - LINE B
Since CPU interrupt pins are expensive and limited, PCI interrupt lines are connected to the input pins of an interrupt controller, PIC or APIC. Whenever an interrupt triggers, the interrupt controller asserts CPU’s interrupt pin. OS can figure out which input pin of the interrupt controller triggered the interrupt and acknpwledge it via reading/writing the IO/Memory-mapped registers of the interrupt controller.
The input pins of the interrupt controller are numbered and are referred to as IRQ lines. The device’s interrupt pin and IRQ line are reported in its PCI Configuration registers and can be viewed with the command:
$ lspci -b -vv
The above hold only for PCI buses. PCI-Express bus, which is implemented as a point-to-point interconnect, has not dedicated out-of-band interrupt lines. Instead, devices attached to PCIe have to implement MSI/MSI-X in-band interrupt mechanism to trigger interrupts (this is described below). However, for backward compatibily with drivers that do not support MSI/MSI-X interrupts, PCIe capable devices can emulate legacy interrupts.
2) MSI interrupts
MSI interrupts are in-band interrupts i.e. no dedicated interrupt pins exist but instead interrupts are reported via the same lines used for data transfers. An MSI capable device triggers an interrupt by sending a special packet called MSI Message. Whether a device is MSI capable, it is indicated in the MSI Capability structure in its PCI Configuration registers. A device can support up to 32 MSI Messages (interrupts). The actual supported number is reported in [3:1] bits of Message Control field of the MSI Capability structure, while the requested number of MSI Messages is written in [6:4] bits of the same register (here we need to say, that the number is reported as a power of 2, so you need to do 1 << num to take the corresponding decimal value).
The implementation of MSI, as any interrupt mechanism, is dependent on the underlying hardware. On x86 platformsm, MSI Messages are filtered by the interrupt controller, more specifically the IO APIC, and are translated to the correct virtual IRQ line. When an interrupt triggers, device writes in the MSI message the contents of Message Address and Message Data fields of MSI Capability structure, so that IO APIC get the necessary information regarding where to store the message in its address space.
To see whether MSI messages are enabled for a pci device, as well as the contents of MSI Capability structure, search in the lspci -vv output for a line started with:
3) MSI-X interrupts
MSI-X is an enhancement to MSI. It can provide up to 2048 interrupts and supports different Address and Data for each vector. Whether a device is MSI-X capable, as well as the number of supported MSI-X vectors (i.e. the size of MSI-X Table), are reported in the MSI-X Capability structure in its PCI Configuration registers. MSI-X Capability structure provides, also, pointers to the MSI-X Table and Bit-per-vector Pending Bit Array (PBA) structures, which reside in memory-mapped address space. MSI-X Table holds the Address and Data for each supported MSI-X message as well as a Control field which is used for masking the interrupt corresponding to this message. PBA, as its name signals, is an array of bits with size equal to number of supported MSI-X vectors, where each bit, if set, indicates a pending interrupt. When the device wants to deliver an interrupt, it sets the PBA bit of corresponding MSI-X vector and, if the associated entry in the MSI-X Table does not have its Control field set to masked, the device writes Data to the memory location indicated in Adress field.
To see whether a pci device is MSI-X capable as well as the contents of MSI-X Capability structure, search in the lspci -vv output for a line started with:
What’s the main MSI/MSI-X benefit? Interrupt affinity
Aside from gains in size by eliminating the need of external interrupt pins and aside from eliminating interrupt sharing, MSI/MSI-X mechanism enables the binding of interrupts to a specific CPU. Modern SMP systems that support process affinity see significant performance benefits by increased cache hits when the interrupt is delivered to the CPU running the process associated to this interrupt. For SMP systems with more than 32 cores, MSI-X eliminates the need of re-vectoring logic, necessary in case of MSI to implement interrupt affinity.
After this “brief” description, lets move to kernel stuff …
The information related to each interrupt is stored in an ‘irq_desc’ structure. Each interrupt is referenced by a numeric value corresponding to an entry to an interrupt descriptor table kept by kernel. Each irq descriptor is associated with a list of handlers (in case of shared interrupts) via its ‘irqactions’ field. physical IRQ lines (i.e. the input pins of interrupt controllers) is a limited resource. The number of available IRQ lines depend on the interrupt controller. On systems having master and slave PICs 16 IRQ lines are used while on systems with APIC 32 IRQ lines are available, 8 of those can be used by PCI devices.
The xhci driver in its pci probe function calls usb_hcd_pci_probe() which will call in turn pci_enable_device(). When pci_enable_device() is called, pci kernel code will first look if the pci device is msi/msi-x capable. If it isn’t, the Interrupt Pin will be read from its PCI Configuration registers and kernel will search if IO APIC has an available IRQ line routed to this pin. If found one, it will update the contents of Interrupt Line register of device’s PCI Configuration Space so that the driver will get informed for which IRQ line to request interrupts. If the device is msi/msi-x capable, xhci driver needs to call pci_enable_msi_block() or pci_enable_msix(), respectively. Kernel uses the structure ‘msi_desc’ to store the information related to MSI interrupts (such as its virtual irq number, the last message etc).
pci_enable_msi_block() will care to allocate and initialize the msi descriptors requested, it will create as many entries in the irq table as the number of MSI vectors, it will assign the irq numbers and it will write the appropriate values to the Address and Data fields on the MSI Capability structure. ‘msi_list’ field of the pci_dev structure can be used to find the assigned irq numbers when requestin an MSI interrupt.
pci_enable_msix() —–> TODO
After that, xhci can request to associate an interrupt to a handler using request_irq(). This request must be performed before the device is instructed to generate interrupts.
request_irq(irq_num, handler, irq_flags, name, driver)
The number of interrupts supported by xHC host controller is reported via the MAX_INTRS field of the HCCPARAMS1 register. Normally, the same number shall be reported in its MSI/MSI-X Capability structure. These interrupts are used to signal to the xhci driver that a new Transfer Event has been posted in one of its Event Rings. However, the current implementantion of the driver uses only one Event Ring with the intention to extend the number of Event Rings in the future, hence only one interrupt is used at the moment.
When the xhci driver wants a transfer to trigger an interrupt on completion or when a short packet is detected, i.e. the xHC to register a Transfer Event on the Event Ring and send an MSI/MSI-X message, it sets the IOC (Interrupt-On-Completion) or the ISP (Interrupt-on-Short-Packet) bit, respectively, to 1 and specifies the interrupter number (not the irq number) in the Interrupter Target field of the Transfer TRB.
Mind your step when you are reading my posts!!!
Please don’t trip on the often blurry line between fiction and reality