In this thread, Alain wrote: > > Eveything works very fast for files up to about 500k, but as the file > > size approaches the megabyte range, array creation slows way down. The > > 4.2 MB King James Bible eventually causes a crash. It has me stumped. I > > fear it may have something do with the size of splitArray, but I can't > > see how. > > > > I was hoping you or Robert P. or Jay could provide a faster Split > > function that won't bog down. I'm reaching the limits of my skill here. > > I really believe FB^3 should have a strong Split function to take the > > pain out of array construction. > >I don't know why the crash. >You must know that the dynamic array is automatically resized by the >runtime everytime it is necessary. The array is increased by default >to hold 10 more items. You can change that increment setting the >global runtime variable gFBDynamicGrowInc. This can reduce the number >of times the array must be resized and limit the slowdown >significantly. Or you can address an item with a very high index so >that the array is sized once for all. >Jay might have an explanation or a solution to this problem, I believe >he has already worked with the King James Bible, however I don't know >if he had been using dynamic arrays for that purpose. Resizing gFBDynamicGrowInc seems to help, speedwise. I wish this was documented. >there is a bug in the Split function I provided. The following statement: > >Split = 0 > >should be placed above the conditional structure (not inside). To >avoid such problem, I am used now to write all my local functions with >the leading clear statement, and since I just cut and paste the old >function in your code to start with I forgot to add it. I would >encourage you to use it as well. Fixed. Thanks >Perhaps, it would be more efficient to load the text file directly >into the container avoiding the duplication of the data. >Maybe you can replace your OpenFileToParse function with: > >local mode >local fn OpenTextFileToContainer( @CPtr as ptr ) >dim f as FSSpec >dim size as long > >long if len( files$( _FSSpecOpen, "TEXT", "Open file to parse", f ) ) >if CPtr.nil& then DisposeHandle( CPtr.nil& ) : CPtr.nil& = _nil >on error end >open "I", #1, @f >size = lof( 1, 1 ) >long if error = _noErr and syserror = _noErr and size > 0 >CPtr.nil& = Fn NewHandle( size ) >long if fn MemError = _noErr and CPtr.nil& != _nil >HLock( CPtr.nil& ) >read file #1, [CPtr.nil&], size >HUnlock( CPtr.nil& ) >long if error != _noErr or syserror != _noErr >DisposeHandle( CPtr.nil& ) : CPtr.nil& = _nil >end if >xelse >if CPtr.nil& then DisposeHandle( CPtr.nil& ) : CPtr.nil& = _nil >end if >end if >close 1 >error = _noErr : syserror = _noErr >on error return >end if >end fn Fixed. >I have noticed that you delete parenthesis, brackets and the likes in >the ParseContainer function, but I think those characters should be >replaced with a space char instead. >Now, if you take into account all the possible variants of those >beasties (I don't know if you have all that menagerie in the English >language), the replacement starts to take some time. In that case >since you are replacing one character with another, it should be >faster to walk through the block of data and poke bytes when needed. >You would lose some international compatibility brought by the >ReplaceText function, but you would see a big difference in speed. > >dim as ptr startPtr, endPtr >HLock([@gC]) >startPtr = [[@gC]] >endPtr = startPtr + fn GetHandleSize( [@gC] ) >while startPtr <= endPtr >select |startPtr| >case >9,10,13,34,194,_":",_"'",_"(",_")",_"{",_"}",_"[",_"]",_""",_"'",_"'",_"«",_"»",_"'",_""" >startPtr.nil`` = _" " >end select >startPtr++ >wend >HunLock( [@gC] ) On the beta list, Alain provided me with an even more efficient way of parsing the file: case < _"A" : startPtr.nil`` = _" " Incorporating Alain's changes, the code is beginning to show real promise. Thanks again to all, including those who have e-mailed off-list with suggestions. We'll continue to hone it. Anyone have an idea of how to break a container of ASCII text into chunks under the theory that it will take less time to parse and split a smaller chunk into an array, that it would a huge file? Best, Ken