kickaha | Silly boy...

You're viewing

kickaha's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

kickaha

No, you may *not* read a 4.8GB results file into memory, even with VM mapping, on a 32-bit system.

Flat | Top-Level Comments Only

From:

cspowers.livejournal.com

This past semester, a co-worker and I sponsored a student project at NCSU. Basically it was just a tool to do some number crunching to look for unusual behavior based on a user specified tolerance level.

At first the team thought this was too simple a project and not that interesting. But when we pointed out to them that their data source was typically 20gb or more and there's no way they could possibly keep all their intermediary calculations in memory at once they got kinda big-eyed.

In the end they came through with flying colors thanks to some mentoring from my co-worker T. I think it was the heftiest piece of design they'd ever done.

From:

kickaha.livejournal.com

*laugh* Yeah, large dataspace manipulation is a pain. Up until this, the largest data I'd worked with was ~200MB, and this popped up in the middle of a bunch of tests unexpectedly.

I'm going to look into how to kick Python into doing the file read a bit more intelligently (perhaps breaking it up into pieces of a GB or so, ensuring that I don't break across a chunk I need to keep coherent during analysis), but it's a great justification for a Dual G5 and 10.4! :D

"No, really, I *need* the big iron to do my research honey, I *swear*!" ;)