Advertise Mobile SDKs Books Events Forum News Social Networking Support Us
Follow @iphonedevsdk on Twitter

Mockup & CodeGen, iPhone & iPad
($9.99)

Make your own iPhone apps
and run them live!
(free)

Manu
($0.99)

Want your application or service advertised on iPhone Dev SDK?

Go Back   iPhone Dev SDK Forum > iPhone SDK Development Forums > iPhone SDK Development

Reply
 
LinkBack Thread Tools Display Modes
Old 05-05-2009, 05:59 PM   #1 (permalink)
Registered Member
 
Join Date: Aug 2008
Location: Memphis, TN, USA
Age: 24
Posts: 3,534
Send a message via ICQ to smithdale87 Send a message via AIM to smithdale87 Send a message via Skype™ to smithdale87
Default calculating MD5 hash of a large file

Hi,

I've downloaded a large file ( > 100 MB ) to my Doc's directory, and I need to calculate the file's MD5.

I can't load the entire file into an NSData object using initWithContents of file because that crashes the app due to lack of memory.

How can I load pieces of the file, say 10MB at a time, and calculate the MD5 incrementally?

Thanks
smithdale87 is offline   Reply With Quote
Old 05-07-2009, 08:45 AM   #2 (permalink)
Registered Member
 
Join Date: Aug 2008
Location: Memphis, TN, USA
Age: 24
Posts: 3,534
Send a message via ICQ to smithdale87 Send a message via AIM to smithdale87 Send a message via Skype™ to smithdale87
Default

bump
smithdale87 is offline   Reply With Quote
Old 05-07-2009, 10:47 AM   #3 (permalink)
Registered Member
 
Join Date: Feb 2009
Posts: 64
Default

Well Objective C supports all of regular C, so there's always the old school fread to fall back on.
__________________
Storie
Download Storie now - it's free!
ziconic is offline   Reply With Quote
Old 05-07-2009, 12:28 PM   #4 (permalink)
Registered Member
 
Join Date: Jan 2009
Location: San Diego, CA
Posts: 406
Default

Even 10MB is too much in one gulp. And there's no reason to read in such large chunks anyway.

Frankly, if you need help with this, you need to get a basic education in programming before you tackle an iPhone app. It sounds like you are just copying an example, and expect us to write your code for you.

But I'll try to help you help yourself anyway. Do you think that initWithContents is the only way to read data from a file? iPhone has an extensive API (actually, several APIs...) for reading files. How about hitting the "Help" button in XCode and doing some exploring?

If that's too much, try the "Files" chapter in any iPhone development book. You do have one of those, don't you?

After you've done that, if there's something you don't understand, feel free to come back and ask.
jtara is offline   Reply With Quote
Old 05-11-2009, 12:56 PM   #5 (permalink)
Registered Member
 
Join Date: Aug 2008
Location: Memphis, TN, USA
Age: 24
Posts: 3,534
Send a message via ICQ to smithdale87 Send a message via AIM to smithdale87 Send a message via Skype™ to smithdale87
Default

Lol thanks ziconic for the fread idea. Thankfully I don't have to use C to do this.

Thanks jtara for the @$$hole response. I was merely looking for direction, as you pointed to the "Files" chapter, not a breakdown of how incompetent I was.


Here's my solution for others that may run into this same problem:

Code:
+(NSString*)fileMD5:(NSString*)path
{
	NSFileHandle *handle = [NSFileHandle fileHandleForReadingAtPath:path];
	if( handle== nil ) return @"ERROR GETTING FILE MD5"; // file didnt exist
	
	CC_MD5_CTX md5;

	CC_MD5_Init(&md5);
	
	BOOL done = NO;
	while(!done)
	{
		NSData* fileData = [handle readDataOfLength: CHUNK_SIZE ];
		CC_MD5_Update(&md5, [fileData bytes], [fileData length]);
		if( [fileData length] == 0 ) done = YES;
	}
	unsigned char digest[CC_MD5_DIGEST_LENGTH];
	CC_MD5_Final(digest, &md5);
	NSString* s = [NSString stringWithFormat: @"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
				   digest[0], digest[1], 
				   digest[2], digest[3],
				   digest[4], digest[5],
				   digest[6], digest[7],
				   digest[8], digest[9],
				   digest[10], digest[11],
				   digest[12], digest[13],
				   digest[14], digest[15]];
	return s;
}
smithdale87 is offline   Reply With Quote
Old 07-10-2009, 10:41 AM   #6 (permalink)
Registered Member
 
Join Date: Aug 2008
Posts: 10
Default

Thanks for sharing.
beausejour is offline   Reply With Quote
Old 11-12-2009, 02:31 PM   #7 (permalink)
Registered Member
 
Join Date: Oct 2008
Posts: 4
Default

Quote:
Originally Posted by smithdale87 View Post
Code:
+(NSString*)fileMD5:(NSString*)path
{
	NSFileHandle *handle = [NSFileHandle fileHandleForReadingAtPath:path];
	if( handle== nil ) return @"ERROR GETTING FILE MD5"; // file didnt exist
	
	CC_MD5_CTX md5;

	CC_MD5_Init(&md5);
	
	BOOL done = NO;
	while(!done)
	{
		NSData* fileData = [handle readDataOfLength: CHUNK_SIZE ];
		CC_MD5_Update(&md5, [fileData bytes], [fileData length]);
		if( [fileData length] == 0 ) done = YES;
	}
	unsigned char digest[CC_MD5_DIGEST_LENGTH];
	CC_MD5_Final(digest, &md5);
	NSString* s = [NSString stringWithFormat: @"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
				   digest[0], digest[1], 
				   digest[2], digest[3],
				   digest[4], digest[5],
				   digest[6], digest[7],
				   digest[8], digest[9],
				   digest[10], digest[11],
				   digest[12], digest[13],
				   digest[14], digest[15]];
	return s;
}
@smithdale

Thanks for posting your solution. Do you have this running in a production environment? I tried it against some large files and saw RAM consumption shoot up, the autorelease pool is not getting a chance to get drained inside the loop.

So I went with a more specific alloc and release

Code:
		NSData *fileData = [[NSData alloc] initWithData:[handle readDataOfLength:4096]];
		CC_MD5_Update(&md5, [fileData bytes], [fileData length]);
		
		if( [fileData length] == 0 ) {
			done = YES;
		}
		
		[fileData release];
edster is offline   Reply With Quote
Old 11-12-2009, 02:42 PM   #8 (permalink)
Registered Member
 
Join Date: Aug 2008
Location: Memphis, TN, USA
Age: 24
Posts: 3,534
Send a message via ICQ to smithdale87 Send a message via AIM to smithdale87 Send a message via Skype™ to smithdale87
Default

I wound up not using this code exactly as is.

Since I was downloading files from a server, I just updated the MD5 with each chunk of data that was received, this way I never have to waste the time of calculating the entire md5 of a huge file all at once.

Where are the files coming from in your situation?
smithdale87 is offline   Reply With Quote
Old 11-12-2009, 02:54 PM   #9 (permalink)
Registered Member
 
Join Date: Oct 2008
Posts: 4
Default

Quote:
Originally Posted by smithdale87 View Post
I wound up not using this code exactly as is.

Since I was downloading files from a server, I just updated the MD5 with each chunk of data that was received, this way I never have to waste the time of calculating the entire md5 of a huge file all at once.

Where are the files coming from in your situation?
Hmm, interesting. My files are coming from the internet, but they are downloaded at a different point in time. Maybe I could move the hash check back in time to the point of download. In this case, there are several files that make up a single object, so that object checks the hashes on all of the files before marking itself ready. In some cases, some file components might be 30-40MB videos.

Right now though, I'm not having a problem with the app consuming too much memory with the current implementation. It doesn't seem to take very long to calculate the hash, so I'm not too concerned. Its a background thing anyway. I was seeing memory shooting up over 30MB into danger land with your code above. Right now my app is stable at around 8MB.
edster is offline   Reply With Quote
Old 11-12-2009, 03:29 PM   #10 (permalink)
Registered Member
 
Join Date: Aug 2008
Location: Memphis, TN, USA
Age: 24
Posts: 3,534
Send a message via ICQ to smithdale87 Send a message via AIM to smithdale87 Send a message via Skype™ to smithdale87
Default

So you're explicit alloc/release keeps memory usage under control, whereas the autoreleased objects (in my solution) cause problems?
smithdale87 is offline   Reply With Quote
Old 01-25-2010, 05:22 PM   #11 (permalink)
Registered Member
 
Join Date: Oct 2008
Posts: 4
Default

Quote:
Originally Posted by smithdale87 View Post
So you're explicit alloc/release keeps memory usage under control, whereas the autoreleased objects (in my solution) cause problems?
Time to circle back on an issue I thought was long resolved. Back when I first did this, it seemed like the explicit init and release was doing the trick. However, I'm getting some crash reports from an old iTouch user where they are running out of memory.

So now I'm testing again and I'm seeing a huge spike in memory usage when its calculating the hash. Not doing the MD5 check, my app stays around 9.5 - 11MB for its entire life, while downloading as much as 200-300MB in the background. If I enable the MD5 check, memory will shoot up as high as 60MB or so which will kill old devices. Even after the spike, the memory seems to be creeping up over time even though 'Leaks' is not showing any memory leaking. So regardless of the memory spike, I think the hash routines are not freeing some allocated memory.

If you are having good luck with hashing the bytes as they stream down rather than on the file, I might look at switching to something like that.

The investigation continues...
edster is offline   Reply With Quote
Old 01-25-2010, 05:33 PM   #12 (permalink)
Registered Member
 
Join Date: Aug 2008
Location: Memphis, TN, USA
Age: 24
Posts: 3,534
Send a message via ICQ to smithdale87 Send a message via AIM to smithdale87 Send a message via Skype™ to smithdale87
Default

Yea I never had much trouble out calculating the hash as the file was downloading. Perhaps that's the direction you should head in.
smithdale87 is offline   Reply With Quote
Old 09-15-2010, 11:43 PM   #13 (permalink)
Registered Member
 
Join Date: Sep 2010
Location: San Francisco
Posts: 1
Default A solution that works to compute MD5 hash of large file with low memory consumption

Quote:
Originally Posted by edster View Post
Time to circle back on an issue I thought was long resolved. Back when I first did this, it seemed like the explicit init and release was doing the trick. However, I'm getting some crash reports from an old iTouch user where they are running out of memory.

So now I'm testing again and I'm seeing a huge spike in memory usage when its calculating the hash. Not doing the MD5 check, my app stays around 9.5 - 11MB for its entire life, while downloading as much as 200-300MB in the background. If I enable the MD5 check, memory will shoot up as high as 60MB or so which will kill old devices. Even after the spike, the memory seems to be creeping up over time even though 'Leaks' is not showing any memory leaking. So regardless of the memory spike, I think the hash routines are not freeing some allocated memory.
Even though you use a non autoreleased object called fileData, you still have an autoreleased object there:

Code:
NSData *fileData = [[NSData alloc] initWithData:[handle readDataOfLength:4096]];
Remember that the result of -readDataOfLength: was allocated and autoreleased. So, in reality, your solution is worse than the one proposed by smithdale87, because you end up allocating the same objects, plus one.

I came up with an implementation that really works. I wrote an article about this efficient way to compute the MD5 hash of a large file.

I hope this helps.
__________________
Joel Lopes Da Silva
JoeKun is offline   Reply With Quote
Old 06-29-2011, 07:41 AM   #14 (permalink)
Registered Member
 
Join Date: Mar 2011
Posts: 26
Default

What about wrapping the reads from the file in an auto release pool?

Code:
+(NSString*)fileMD5:(NSString*)path
{
	NSFileHandle *handle = [NSFileHandle fileHandleForReadingAtPath:path];
	if( handle== nil ) return @"ERROR GETTING FILE MD5"; // file didnt exist
	
	CC_MD5_CTX md5;

	CC_MD5_Init(&md5);
	
	BOOL done = NO;
	while(!done)
	{
		NSAutoreleasePool * pool = [NSAutoreleasePool new];
		NSData* fileData = [handle readDataOfLength: CHUNK_SIZE ];
		CC_MD5_Update(&md5, [fileData bytes], [fileData length]);
		if( [fileData length] == 0 ) done = YES;
                [pool drain];
	}
	unsigned char digest[CC_MD5_DIGEST_LENGTH];
	CC_MD5_Final(digest, &md5);
	NSString* s = [NSString stringWithFormat: @"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
				   digest[0], digest[1], 
				   digest[2], digest[3],
				   digest[4], digest[5],
				   digest[6], digest[7],
				   digest[8], digest[9],
				   digest[10], digest[11],
				   digest[12], digest[13],
				   digest[14], digest[15]];
	return s;
}
cncool is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



» Advertisements
» Online Users: 430
13 members and 417 guests
bev6a8hl, beyondstop, Edsilmars, erritikamathur, HowEver, john234ny, linkmx, MozyMac, phillipie99, pill5b3rry, pochuang, pufftissue, Rudy
Most users ever online was 1,187, 10-11-2011 at 08:09 AM.
» Stats
Members: 157,847
Threads: 88,913
Posts: 379,291
Top Poster: BrianSlick (7,072)
Welcome to our newest member, bev6a8hl
Powered by vBadvanced CMPS v3.1.0

All times are GMT -5. The time now is 12:48 AM.
Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0