Hi,
I have been following this thread with interest. I was looking at the
watchdoginsert.c code (in 4.6.5) today and I can see what is causing
this issue, thanks to Steven's debugging effort.
The only way to protect against this is to ensure that if you are
inserting the new watchdog at the head of the watchdog list, we need
to ensure that it does not have a zero delta_interval, this watchdog
is effectively inserted after the tick and so should have a count of 1
still to go which will be decremented at the next clock isr.
The simple solution would be to check after the for loop if the
"after" watchdog is still active, and if not, then make sure the
delta_interval is greater than zero, in watchdoginsert.c:
} /* End of for loop */
_Watchdog_Activate(the_watchdog);
+ /* Check that the after watchdog is still active
+ * (not removed in the ISR_Flash via a clock isr) */
+ if(after->state != WATCHDOG_ACTIVE)
+ {
+ /* Ensure we have a delta_count of greater than 0
+ * since we haven't really started our timeout yet */
+ if(delta_count == 0)
+ detla_count++;
+ }
the_watchdog->delta_interval = delta_interval;
So, in my conclusion, I think Steven's initial change did not loose
any watchdog counts and in fact effectively provides the same result
as this code, but at least with this code, we know where it is coming
from rather than a solution to an unknown problem.
I hope this is correct and makes sense to everyone.
regards,
Ian Caddy
Steven Johnson wrote:
Hi Joel,
Since you are hacking on the source anyway, you could add a variable
which is set to note that you are at the flash and clear it when
interrupts are redisabled. Then when the 0 at the head of the delta
chain occurs there is a bread crumb. Several places could be marked
this way.
When you get the fault, check _ISR_Nest_level and check it.
We did the following test:
The ISR_Flash macro was modified to set "_ISR_In_Flash" on entry and
clear it on exit.
We added the following code to the watchdog tickle:
if (_ISR_Nest_level > 1)
_ISR_Nest_Count++;
if (prev_in_flash)
{
if (!the_watchdog->delta_interval)
{
the_watchdog->delta_interval = 1;
}
prev_in_flash = 0;
}
if (_ISR_In_Flash)
prev_in_flash = 1;
The results are these:
1. A breakpoint on the the _ISR_Nest_Count++ never triggered.
2. A breakpoint on "prev_in_flash = 1" triggered every so often.
3. A breakpoint on "the_watchdog->delta_interval = 1;" also triggered
So this seems to prove that what is happening is that interrupts are
being enabled after the delta_interval is being set to zero, but
before it is added to the watchdog chain. An ISR occurs (but not a
nested one) that removes all the entries from the head of the
watchdog chain. When the ISR returns, the zero calculated delta is
added, but instead of being after the entries in the chain it is
calculated on it is added at the head because all of the entries it
was calculated relative to have been removed. Which causes the next
watchdog tickle (following the one that occured during the flash) to
remove the head with 0 in it, decrement to 2^32-1, and exhibit our
fault.
I dont use simulators, but i imagine you could force this behaviour
by careful control of the simulator to exhibit these characteristics
for verification.
To also answer Jennifer,
On most of the PowerPC boards using new interrupt processing,
interrupts
are turned back on in C_dispatch_irq_handler allowing nested
interrupts. If this is occuring in your bsp, you may want to stub
out the sections
that do this and see if the problem goes away.
We dont actually use either the old or new PowerPC interrupt
processing in RTEMS, i did my own interrupt processing (loosely based
on the old powerpc interrupt processing) for my target years ago, and
have stuck with it cause its works fine for me and ive tuned it to my
liking for the mpc860 irq's. Which was why i was pretty sure we
shouldnt have nested IRQ's. The above test indicates that we arent
getting nested IRQ's in this instance, as _ISR_Nest_Count is never
incremented.
Hope that helps.
Steven Johnson