Bug #20
Memory leak on Mac OS X ?
| Status : | Closed | Start : | ||
| Priority : | Low | Due date : | ||
| Assigned to : | John Goerzen | % Done : | 100% |
|
| Category : | Other | |||
| Target version : | - | |||
| Resolution : |
Description
Hi,
I like OfflineIMAP so much that I packaged it for Fink, so I am starting to get bug reports from angry users ;-) Here is one, the same actually happened to me a few times before. When e.g. moving a large mailbox on the server and then syncing the many messages in it, sometimes (and apparently on specific messages, but I have no clue about what the trigger is) offlineimap will die and give a stack trace like the one attached. Usually, running once with maxconnection=1 will work, and then everything is back to normal.
It has been suggested elsewhere that this might actually be a memory leak in Mac OS X's version of Python, see
http://sourceforge.net/tracker/index.php?func=detail&aid=1092502&group_id=5470&atid=105470
Anyone know why this is happening ?
/vincent
(Using component 'other' because it is not clear where the bug lies, or even if it is a bug of OfflineIMAP ...)
Associated revisions
Revision 10c2b6fbaa333a7fe4c58e5ef6b1186fdb5f8d7e
Apply new darwin.patch from Vincent Beffara
fixes #20
patch dated 5/27/2008
Revision 57492473823e42a2bc4d582a0186db2233b350f1
Revert "Apply darwin.patch from Vincent Beffara"
This reverts commit 9f5c8d708bfa9c16092255a59e13fe8171588c7a.
Several people were complaining about problems. See
http://bugs.debian.org/479677.
Closes: #479677.
refs #20.
Revision 9f5c8d708bfa9c16092255a59e13fe8171588c7a
Apply darwin.patch from Vincent Beffara
fixes #20
04/27/2008 09:34 AM - Vincent Beffara
- File darwin.patch added
OK, I finally installed an IMAPS server on my iMac, so I could test a
little further. For some reason, fixing the read() method in
imaplibutil.py did not seem to work (it hung on connecting to the
server) - however, modifying the file imapserver.py similarly to the
non-SSL case fixed the crash.
I also reduced the chunk size to 100k instead of 1M, as 1M seemed to
still trigger the memory error in some cases. Ah, and I added a
platform test, so that the patch does essentially nothing on
non-Darwin machines ...
So, still no guarantee, but the attached patch works for me. Any
comments ?
(If noone here screams in horror at my code, I will include the patch
into the Fink package and see what happens there.)
History
07/07/2007 11:19 PM - carson -
The problem seems to occur when syncing large messages. The program executed cleanly after removing these from my server.
11/01/2007 08:39 PM - vasi -
Hello fellow fink maintainer! Thanks for packaging offlineimap. The problem you're having is definitely the bug you linked to (though the link doesn't work anymore, the report is now located here: http://bugs.python.org/issue1092502 ). The good news is I have a workaround.
Here's a quick rundown of how and why it happens: Suppose offlineimap tries to fetch an email that's 10MB in size over SSL. The SSL socket allocates 10MB of memory, but because the SSL buffer is small, it only ends up with 1K of data. So the socket calls realloc() to resize the 10MB to 1K before returning the data. Imaplib notices it only got 1K, so it stashes the data it just got and asks for the rest of the 10MB, over and over, about 10,000 times until it's all read. But on OS X, realloc() doesn't actually do anything--so each of the blocks stays large. We have a total of 10,000 blocks x 5MB avg. per block = 50 GB of memory! A 32-bit processor can only handle 4GB per process, so python crashes.
I can verify this behavior occurs in both Mac OS X 10.3 and 10.4 (I don't yet have 10.5). Apple hasn't wanted to fix the problem, because technically it's within spec to ignore realloc() when it shrinks the size of something. The pythons folk haven't done so either, since they say that while OS X's behavior may be within spec, it's stupid. But we can work around it in offlineimap. The strategy is to change the way that IMAP4_SSL reads data, so that it doesn't keep copies of all those 1K-but-really-10MB blocks:
# Add to class UsefulIMAP4_SSL in imapserver.py: def read(self, size): read = 0 io = StringIO() while read < size: data = self.sslobj.read(size-read) read += len(data) io.write(data) return io.getvalue()
03/03/2008 08:40 AM - John Goerzen
Hi Vincent,
I apologize for not looking at this sooner.
I went and checked on the linked Python bug. It looks like they've committed a fix to both the 2.5 and the 2.6 tree on their end.
I am not sure that this patch is quite right; see the read method in imaplibutil.py and how it saves off its buffer. With the Python 2.5 fix, is it still important to apply the patch here?
03/03/2008 10:12 AM - John Goerzen
Vincent emailed me this:
Well at the time I just saw the whole thing as a black box, and since
the bug appeared in offlineimap I reported it here - if Python is fixed
regarding the whole realloc() behavior, I guess there is not much reason
to include it - IF everybody upgrades their Python version of course.
On Mac OS X, it used to be that the python coming with the system was
2.3, so one had to install a newer one (via e.g. Fink), and updates were
propagated too. Now they provide (pre-patch) python 2.5 ...
[Well, this answers my yet-unasked question, the solution is to still
install the python 2.5 in Fink to get the patch.]
04/17/2008 11:15 AM - Vincent Beffara
- File offlineimap.patch added
Just a status update - I put a patch into the fink package, which fixed the crash for me. Unfortunately, it didn't fix it for a few others who wrote to me since then ... I am attaching the patch file in case somone is interested in testing it.
Essentially, I am following vasi's way (see above) also for the non-SSL, non-tunnelcmd access mode, but I have to choose a non-stupid chunk size. I tried with 1Mo, it fixed it for me but apparently was not small enough. So I switched to 100k. I fear that a smaller chunk size would lead to severely degraded performance - admittedly better than a crash but affecting all e-mails and not only the huge ones.
BTW, the crash is quite elusive, so I can't test the patch as much as I'd like to. In particular, I have no mail with an IMAPS access (I do IMAP over an ssh tunnel), and I hear that the crash is not 100% fixed by the patch on IMAPS.
About the read() method in imaplibutil.py: yes, it probably should be patched there too ... or instead ?
04/27/2008 01:40 AM - yaw anokwa
- File trace.txt added
i'm using offlineimap 5.99.11 and python r251:54863 on 10.5.2 and i'm getting similar memory errors. i have attached a stack trace.
04/27/2008 07:17 AM - Vincent Beffara
Yep, that's the same issue, no doubt about it ! Hopefully python will be updated soon ...
BTW, you seem to be using Leopard's stock install of Python, but the version in Fink (r252:60911 at the moment) does not help at all. (Yet?)
04/27/2008 09:34 AM - Vincent Beffara
- File darwin.patch added
OK, I finally installed an IMAPS server on my iMac, so I could test a little further. For some reason, fixing the read() method in imaplibutil.py did not seem to work (it hung on connecting to the server) - however, modifying the file imapserver.py similarly to the non-SSL case fixed the crash.
I also reduced the chunk size to 100k instead of 1M, as 1M seemed to still trigger the memory error in some cases. Ah, and I added a platform test, so that the patch does essentially nothing on non-Darwin machines ...
So, still no guarantee, but the attached patch works for me. Any comments ?
(If noone here screams in horror at my code, I will include the patch into the Fink package and see what happens there.)
04/27/2008 04:16 PM - yaw anokwa
darwin.patch is working for me with offlineimap 5.99.11 and python r251:54863
05/01/2008 04:17 AM - yaw anokwa
- File trace.txt added
well that patch worked for a few days. now i'm getting a sslerror: (8, 'EOF occurred in violation of protocol') probably due to a big attachment. i've attached a trace.
05/01/2008 07:18 PM - John Goerzen
- % Done changed from 0 to 100
- Status changed from Assigned to Closed
Applied in changeset 9f5c8d708bfa9c16092255a59e13fe8171588c7a.
05/06/2008 08:14 AM - John Goerzen
- Priority changed from High to Low
- Status changed from Closed to Assigned
This patch caused problems for people. As I look at it closer, it looks like the non-Darwin bits aren't no-ops with the previous state (they should be calling self.read and the like)
05/06/2008 08:53 AM - Vincent Beffara
I guess I'm back to the drawing board then ... Hopefully the python people (frightening though their name might be ;->) will do something soon though !
05/07/2008 01:33 AM - T Joseph Carter
The patch applied upstream (switching min/max in Lib/socket.py) is only a bandaid to this problem. If you've got a better way to fix this in the mean time John, it would be appreciated. With the upstream patch, 4MB messages get by, but 9MB messages still kill it. It turns out I have been able to delete the largest of messages on the server (I no longer need them), but deleting mail so that I can sync it isn't exactly the way to go.
05/07/2008 03:55 PM - T Joseph Carter
The fink patch does fix my problems, no doubt about it. Perhaps there is more than one issue at work here?
If a reasonable workaround exists, it should go into offlineimap for users of Python 2.4. MacPorts has neither patched 2.4 nor 2.5 for this. The update earlier reporting that the upstream patch didn't really solve the problem was the result of me applying the 2.5 patch to MacPorts' 2.5.2 and trying it with the latest version of OfflineIMAP (see issue #67 on the site itself about how both fink and MacPorts (and at least the version of Gentoo I have laying about here) all try to download old versions of OfflineIMAP.)
05/08/2008 09:25 AM - Vincent Beffara
The fink patch does seem to fix the problem most of the time (fewer crashes, none that I know of when using non-SSL connections). I have to look deeper into the way things work to understand the problem with SSL, and I have no time at all for that these days - in the mean time, it is certainly better to leave the patch out of the way: better have a crash than e-mail corruption ...
About the version of OflineIMAP in fink : the version in the stable repository is definitely old. The version in unstable is 5.99.8 with a wrong patch; I put a more recent one in the tracker, but it's still there at the moment.
05/08/2008 10:12 PM - T Joseph Carter
Vincent, have you tried a combination of the offlineimap patch AND the Python patch? I've got both here, and the result seems to help a lot. Neither patch seems like a 100% solution on its own, which is a real shame because this is clearly a bug in Python--realloc is never guaranteed to actually free memory when shrinking a buffer, only when growing it if necessary. (And even then, most memory managers are lazy in freeing, so unused memory is only given up when the VM demands it..)
Either way, I noticed that fink's offlineimap generally tended to work, but macports' tends not to with large messages. I say tends because it still breaks now and then, but when it happens usually just rerunning it tends to fix the problem. Without the patch in offlineimap, it doesn't matter how many times you run it--same errno 12 on the realloc, every time.
A better solution really would be a good idea here.
05/09/2008 11:54 AM - Vincent Beffara
Well it is my understanding that the "new crash" exhibited by the patched version is of a different origin, as it is not a memory error but a protocol mismatch. I don't really see how combining both patches would help ...
Apparently the "slicing-up" version of read() defined by the patch is not completely equivalent to the original one. Probably because of an additional argument with a default value or a default behavior when the size argument is 0 ("read as long as there is something to read"), which would explain why there is some message from the IMAP server in what read() returns where it should be whatever comes next.
Again, the solution is to look at the precise semantics of the default read() method and to copy them while slicing it up as needed. Should be quite straightforward to do actually, but it will be time consuming, and time is sth I don't have a lot of these days ...
[Unless my interpretation is completely wrong of course !]
05/27/2008 03:10 PM - Vincent Beffara
- File darwin.patch added
Hi, it's me again ...
Here is a new version of the Darwin patch, where no-ops are really no-ops. There is indeed a subtlety about socket.read() that giving an argument of -1 should read until EOF, where the slicing-up actually returned nothing, hence the stuff was still in the buffer and raised a protocol error. So, 2 fixes : only try to be clever if the asked size is positive, and don't try to second-guess python's resolution method by explicitly calling the parent's read() instead.
This particular bug (negative argument) should only have happened on Darwin though ...
Once more, I cannot test it on as many systems as I'd like, so feedback is welcome !
/v05/28/2008 01:33 PM - T Joseph Carter
This fixes the memory problem, the protocol problem, and generally appears to be a working solution.
09/18/2008 09:23 AM - John Goerzen
- Status changed from Assigned to Closed
Applied in changeset 10c2b6fbaa333a7fe4c58e5ef6b1186fdb5f8d7e.