[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
The "Out of mbuf clusters" problem, resolved
- Date: Wed, 15 Sep 2004 16:08:00 -0400
- From: wxdai at utstar.com (Wenxiang Dai)
- Subject: The "Out of mbuf clusters" problem, resolved
could you send you patch to me? I got similar problem with 4.5.0 version.
Thanks in advance.
> -----Original Message-----
> From: Phil Torre [mailto:ptorre at zetron.com]
> Sent: Wednesday, September 15, 2004 2:05 PM
> To: RTEMS User List
> Subject: The "Out of mbuf clusters" problem, resolved
> In reference to my previous message, here's what I ended up doing to
> "fix" it.
> The deadlocked state that I was observing was caused when the RTEMS
> system was doing sustained file transmission via FTP, and receiving
> a mix of TCP ACKs and broadcast traffic (from chatty ms windows boxes
> on our LAN). With the default mbuf/cluster pool sizes, we quickly
> run out of clusters. (Our Ethernet driver only allocates clusters
> for receive data, which makes matters even worse.)
> As soon as all clusters are exhausted, the receive task goes into
> its "waiting for clusters" loop. As incoming ACKs are processed,
> outbound packets are freed from the sockbuf by TCP, which frees up
> some clusters. But, there is a race condition between the receive
> thread and the application writing to the socket; they both want
> clusters, and the application is winning too much of the time. So,
> the incoming ACKs get lost, the outbound packets stay in the sockbuf
> pending retransmission, and there we sit.
> I expected that TCP would eventually time out and drop the connection,
> which should bring us back to life. It does, but manages not to free
> the outbound packets from the sockbuf. (This makes no sense to me,
> as it seems to guarantee that we will leak memory if a remote client
> hangs. But, it sat there wedged for 16 hours without recovering.
> That's close enough to forever for me.)
> So, I applied two fixes:
> 1) Deadlock recovery. I shortened tcp_keepidle to 30 seconds,
> tcp_keepintvl to 10 seconds, and set always_keepalive. This
> makes the connection time out in a few minutes rather than many
> hours. Then I modified tcp_drop() so that if the connection is
> being dropped due to timeout, both receive and send sockbufs and
> any mbufs/clusters are explicitly freed.
> 2) Deadlock avoidance. To resolve the "receive thread is losing the
> fight for clusters" problem, I modified m_clalloc() to respect a
> global flag set by the receive thread when it is waiting for a
> cluster. No one but the receive thread can get a cluster so long
> as that flag is true.
> With those two changes, my application is now rock-solid even under
> sustained heavy load with default pool sizes. I can offer patches if
> anyone is interested; I don't know if these changes are something
> that would be desirable to merge into RTEMS or not.
> Phil Torre phone: 425-820-6363 x234
> Design Engineer email: ptorre at zetron.com
> Switching Systems Group fax: 425-820-7031
> Zetron, Inc. web: http://www.zetron.com