[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Forseen performance problems with libchip (and other BIG PBS)




On Tue, 9 Feb 1999, VALETTE Eric wrote:

> then latter in ns16559.c we see that each register is accessed
> via call to the functions ptr.

Another approach considered but not implemented was simply to have the
libchip drivers be more of the nature of C code that you "included" from
your BSP specific code and "instantiated" a new driver that was BSP
specific.  

Essentially all that changes is that the indirect calls become macros at
that point.  Not a big deal.

In many ways, libchip is always going to be a tradeoff between reuse,
complexity, and percentage of targets that can use it.  I did not (and
still do not) claim to have the ultimate solution to this problem.  Maybe
I have never said it on the list, but this is in many ways a strawman
implementation.  It is enough to demonstrate that it can be done and does
work.  But there is room for improvement and portability issues not
considered.

My number one goal was to insure that the portability issues were tackled
in the driver.  The second part of the is tuning it for performance.

> I would like to recall that on PPC, motorola the function
> getRegister_f can be implemented by a single movl instruction
> and that using this API will imply to have for each interrupt
> several additionnal indirect function calls compared to normal 
> code execution.

On most CPUs, it can be.

> On modern precessor architecture a function call costs much
> more than the number of processor cycles it requires to perform 
> it as it also implies :
> 
>      - two flushs of the instructon pipleine, and a two reload
>      whih is extermely costly especially for one instruction 
>      functions.
>      - more cache misses, more memory accesses,

I did not realize it would cost this much.

> I would really like someone with the relevant hardware equipement
> to benchmark a driver using this API and a normal driver. I would
> not be surprised to get 40% performance degradation...


We saw so little difference on the SONIC driver that you could not
reliably measure the difference in ttcp throughput.  

I do not have similar benchmarks for the serial drivers.

For the RTC drivers, it is more of a don't care personally.

Also remember that most of the RTEMS BSPS have mediocre device drivers.
Many console drivers are simple dumb polled drivers with no termios
support.  One of the main purposes of libchip is to avoid having a stack
of mediocre z8530 drivers. :)

> Other problems :
> 
>       1) "getRegister_f" should be "getRegister_p" as the mapping of
>       PCI IO space and PCI ISA IO space is board dependent for example
>       on PPC (you can have a look at the definition of inb in the linux
>       source tree to be convinced...),

I won't argue this one but maybe we are disconnecting a bit here.  The
get/set reg routines are specified by the BSP on per chip basis.  Whatever
is required to get to that chip can be done by a routine.  

>       2) The code that needs to be executed to handle an interrupt
>       is BSP specific as the code that needs to be performed
>       to acknoledge the interrupt is usually dependent on the
>       interrupt mangement architecture (access to 8259 PIC to
>       reenable the interrupt needed if SET_VECTOR is called).
>       The actual code will never work on pc386 due to a more
>       powerfull interrupt API for example...

I saw this coming but did not know how to address it.  I suspect the way
to do it is to turn the libchip ISR into an inline routine that is invoked
by a BSP dependent ISR.  

Longer term, the implementation could be something like this:

#define NS16550_GET_REGISTER ...
#define NS16550_SET_REGISTER ...
#define NS16550_NAME_PREFIX MY_BSP_
#define NS16550_MAX_CHIPS 

#include <libchip/ns16650.c>

void MY_BSP_isr(){
 board dependent stuff
 invoke per chip ISR one or more times
}

This would essentially instantiate a BSP specific version of the libchip
driver.  I think it would address your performance concerns.  

What board specific ISR stuff could not be handled this way?

> So to summarize : 
> 
>    1) THE IDEA BEHIND LIBCHIP IS GOOD,
>    2) WHAT IS CURRENTLY DONE IS REALLY BAD AS FAR AS I HAVE 
>    ANY LEGITIMITY TO SAY THIS,

There are holes in it.  There are so many things that can vary.  I have
seen some really ugly things and wanted to cover the basics first.  

This is a first cut and this is the discussion I wanted.  If I had felt
this was in its final form, I would have pushed for its inclusion in 4.0.
:)

>    3) We should start a more interactive spec of libchip API,
> 
> Missing API I can already see :
> 
> 	typedef unsigned8 (*getRegisterSet_f)(...); to fectch 
> 	several contiguous registers
> 
> 	typedef unsifgned (*getDataArea) (...)
>
> In particular for PCI devices with on board memory,
> accessing the data byte per byte is crasy...

Which driver is doing this?  I thought the existing libchip drivers were
all register oriented.

This would certainly help for some chips/boards.

The ultimate goal is for someone to be able to "cookie cutter" a BSP
together from libchip drivers by answering a series of questions and
writing some pretty painless routines.

In order to work out, libchip will require help from the RTEMS community.
This is V1.  It is a concept demonstration.  The serial drivers have been
used on multiple boards from more than one CPU family.  If we need to turn
the implementation upside down, then so be it.

--joel