[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Event not timing out with a wait of 1 tick (and sometimes 2)



Hi,

We have finally been able to test Ian's fix, sorry for the delay.

Following is a brief description of what we did, because it didnt work for us, but we also had to change the code a little to do the test which may have broken it if our understanding isnt correct.

We assume that Ian meant delta_interval not delta_count as there is no delta_count variable (at least not where the compiler could see).

The compiler got tricky and read a bit from the condition register and added it to r8 to perform the "if zero do a ++" code. So we changed it to "if zero then set to 1". (so we could put a breakpoint on it)

We also had code in watchdog_tickle.c, that went, "if the_watchdog->delta_interval == 0 then the_watchdog->delta_interval = 1". A breakpoint on that "the_watchdog->delta_interval = 1" in watchdog_tickle.c got hit several times, but the one in watchdog_insert did not get hit.

Best Regards, and Merry Christmas,
Steven Johnson

Ian Caddy wrote:

Hi,

I have been following this thread with interest. I was looking at the watchdoginsert.c code (in 4.6.5) today and I can see what is causing this issue, thanks to Steven's debugging effort.

The only way to protect against this is to ensure that if you are inserting the new watchdog at the head of the watchdog list, we need to ensure that it does not have a zero delta_interval, this watchdog is effectively inserted after the tick and so should have a count of 1 still to go which will be decremented at the next clock isr.

The simple solution would be to check after the for loop if the "after" watchdog is still active, and if not, then make sure the delta_interval is greater than zero, in watchdoginsert.c:

   } /* End of for loop */

   _Watchdog_Activate(the_watchdog);

+   /* Check that the after watchdog is still active
+    * (not removed in the ISR_Flash via a clock isr) */
+   if(after->state != WATCHDOG_ACTIVE)
+   {
+      /* Ensure we have a delta_count of greater than 0
+       * since we haven't really started our timeout yet */
+      if(delta_count == 0)
+         detla_count++;
+   }

   the_watchdog->delta_interval = delta_interval;

So, in my conclusion, I think Steven's initial change did not loose any watchdog counts and in fact effectively provides the same result as this code, but at least with this code, we know where it is coming from rather than a solution to an unknown problem.

I hope this is correct and makes sense to everyone.

regards,

Ian Caddy



Steven Johnson wrote:

Hi Joel,


Since you are hacking on the source anyway, you could add a variable which is set to note that you are at the flash and clear it when interrupts are redisabled. Then when the 0 at the head of the delta chain occurs there is a bread crumb. Several places could be marked this way.


When you get the fault, check _ISR_Nest_level and check it.




We did the following test:

The ISR_Flash macro was modified to set "_ISR_In_Flash" on entry and clear it on exit.

We added the following code to the watchdog tickle:

if (_ISR_Nest_level > 1)
   _ISR_Nest_Count++;

if (prev_in_flash)
{
   if (!the_watchdog->delta_interval)
   {
       the_watchdog->delta_interval = 1;
   }
   prev_in_flash = 0;
}


if (_ISR_In_Flash) prev_in_flash = 1;

The results are these:

1. A breakpoint on the the _ISR_Nest_Count++ never triggered.
2. A breakpoint on "prev_in_flash = 1" triggered every so often.
3. A breakpoint on "the_watchdog->delta_interval = 1;" also triggered

So this seems to prove that what is happening is that interrupts are being enabled after the delta_interval is being set to zero, but before it is added to the watchdog chain. An ISR occurs (but not a nested one) that removes all the entries from the head of the watchdog chain. When the ISR returns, the zero calculated delta is added, but instead of being after the entries in the chain it is calculated on it is added at the head because all of the entries it was calculated relative to have been removed. Which causes the next watchdog tickle (following the one that occured during the flash) to remove the head with 0 in it, decrement to 2^32-1, and exhibit our fault.

I dont use simulators, but i imagine you could force this behaviour by careful control of the simulator to exhibit these characteristics for verification.

To also answer Jennifer,

On most of the PowerPC boards using new interrupt processing, interrupts
are turned back on in C_dispatch_irq_handler allowing nested interrupts. If this is occuring in your bsp, you may want to stub out the sections
that do this and see if the problem goes away.


We dont actually use either the old or new PowerPC interrupt processing in RTEMS, i did my own interrupt processing (loosely based on the old powerpc interrupt processing) for my target years ago, and have stuck with it cause its works fine for me and ive tuned it to my liking for the mpc860 irq's. Which was why i was pretty sure we shouldnt have nested IRQ's. The above test indicates that we arent getting nested IRQ's in this instance, as _ISR_Nest_Count is never incremented.

Hope that helps.

Steven Johnson