[futurebasic] Re: Writing to a text file

Message: < previous - next > : Reply : Subscribe : Cleanse
Home   : July 2003 : Group Archive : Group : All Groups

From: Ken Shmidheiser <kshmidheiser@...>
Date: Wed, 16 Jul 2003 19:51:33 -0400
In this thread, Alain wrote:

>  > Eveything works very fast for files up to about 500k, but as the file
>  > size approaches the megabyte range, array creation slows way down. The
>  > 4.2 MB King James Bible eventually causes a crash. It has me stumped. I
>  > fear it may have something do with the size of splitArray, but I can't
>  > see how.
>  >
>  > I was hoping you or Robert P. or Jay could provide a faster Split
>  > function that won't bog down. I'm reaching the limits of my skill here.
>  > I really believe FB^3 should have a strong Split function to take the
>  > pain out of array construction.
>
>I don't know why the crash.
>You must know that the dynamic array is automatically resized by the
>runtime everytime it is necessary. The array is increased by default
>to hold 10 more items. You can change that increment setting the
>global runtime variable gFBDynamicGrowInc. This can reduce the number
>of times the array must be resized and limit the slowdown
>significantly. Or you can address an item with a very high index so
>that the array is sized once for all.
>Jay might have an explanation or a solution to this problem, I believe
>he has already worked with the King James Bible, however I don't know
>if he had been using dynamic arrays for that purpose.

Resizing gFBDynamicGrowInc seems to help, 
speedwise. I wish this was documented.


>there is a bug in the Split function I provided. The following statement:
>
>Split = 0
>
>should be placed above the conditional structure (not inside). To
>avoid such problem, I am used now to write all my local functions with
>the leading clear statement, and since I just cut and paste the old
>function in your code to start with I forgot to add it. I would
>encourage you to use it as well.


Fixed. Thanks


>Perhaps, it would be more efficient to load the text file directly
>into the container avoiding the duplication of the data.
>Maybe you can replace your OpenFileToParse function with:
>
>local mode
>local fn OpenTextFileToContainer( @CPtr as ptr )
>dim f    as FSSpec
>dim size as long
>
>long if len( files$( _FSSpecOpen, "TEXT", "Open file to parse", f ) )
>if CPtr.nil& then DisposeHandle( CPtr.nil& ) : CPtr.nil& = _nil
>on error end
>open "I", #1, @f
>size = lof( 1, 1 )
>long if error = _noErr and syserror = _noErr and size > 0
>CPtr.nil& = Fn NewHandle( size )
>long if fn MemError = _noErr and CPtr.nil& != _nil
>HLock( CPtr.nil& )
>read file #1, [CPtr.nil&], size
>HUnlock( CPtr.nil& )
>long if error != _noErr or syserror != _noErr
>DisposeHandle( CPtr.nil& ) : CPtr.nil& = _nil
>end if
>xelse
>if CPtr.nil& then DisposeHandle( CPtr.nil& ) : CPtr.nil& = _nil
>end if
>end if
>close 1
>error = _noErr : syserror = _noErr
>on error return
>end if
>end fn


Fixed.


>I have noticed that you delete parenthesis, brackets and the likes in
>the ParseContainer function, but I think those characters should be
>replaced with a space char instead.
>Now, if you take into account all the possible variants of those
>beasties (I don't know if you have all that menagerie in the English
>language), the replacement starts to take some time. In that case
>since you are replacing one character with another, it should be
>faster to walk through the block of data and poke bytes when needed.
>You would lose some international compatibility brought by the
>ReplaceText function, but you would see a big difference in speed.
>
>dim as ptr startPtr, endPtr
>HLock([@gC])
>startPtr = [[@gC]]
>endPtr   = startPtr + fn GetHandleSize( [@gC] )
>while startPtr <= endPtr
>select |startPtr|
>case
>9,10,13,34,194,_":",_"'",_"(",_")",_"{",_"}",_"[",_"]",_""",_"'",_"'",_"«",_"»",_"'",_"""
>startPtr.nil`` = _" "
>end select
>startPtr++
>wend
>HunLock( [@gC] )


On the beta list, Alain provided me with an even 
more efficient way of parsing the file:

case < _"A" : startPtr.nil`` = _" "

Incorporating Alain's changes, the code is 
beginning to show real promise. Thanks again to 
all, including those who have e-mailed off-list 
with suggestions.

We'll continue to hone it.

Anyone have an idea of how to break a container 
of ASCII text into chunks under the theory that 
it will take less time to parse and split a 
smaller chunk into an array, that it would a huge 
file?

Best,


Ken