[futurebasic] Re: [FB] Speed Help

Message: < previous - next > : Reply : Subscribe : Cleanse
Home   : May 2006 : Group Archive : Group : All Groups

From: Jay Reeve <jayreeve@...>
Date: Mon, 22 May 2006 01:08:34 -0500
On May 21, 2006, at 5:19 PM, David Cottrell wrote:

> Hi All
>
> Can anyone see a way to speed-up the function below?
>
> What I have is two text files which always start with a word and  
> are delimited in
> some way (usually tabs). I need to build a list of matching lines.  
> The problem is
> that the files are very long (one is all the words in English, the  
> other Spanish).
>
> My approach is to build a dynamic array of words in the first list  
> (fast) and then
> scan this list with each item in the other list (what this function  
> does). The
> problem is it takes hours (or appears to).
>
> Any suggestions welcome.
>
> Cheers

David,

I couldn't actually test this, so it may be bug infested, but I think  
the concept is sound. It optimizes register calculations and  
comparisons, minimizing reading from and writing to RAM. This should  
shave a little time off your routine, but the thing that would make  
the biggest difference would be to sort your gArrayF1 as you're  
building it. Then you could use a MUCH faster search to see if each  
word in the other file was there. I expect it would cut the time at  
least in half--perhaps dramatically more! Is that a possibility? It  
would likely be worth building a second array of indices and sorting  
that for the search.

I may play with that a little further. Let me know if you need any  
help with bug extermination.

hth,
   e-e
   =J= a  y
    "


'------------------------------------------------------------
local
dim as ptr        p, p1, pF1
dim as long       pEnd, L, F1item, target

local fn scanFIle2 (delim as long)
dim as long     handleSize

if gFile2Hnd = 0 then exit fn
handleSize   = FN GetHandleSize( gFile2Hnd )
if handleSize < 8 then exit fn
HLock(gFile2Hnd)

p           = [gFile2Hnd]
pEnd        = p + handleSize - 1
gMatchCount = 0

for p = p to pEnd'loop through each line

for p1 = p + 1 to pEnd' Search for end of word (delim)
long if p1.0`` < _"a"
long if p1.0`` == delim' Found a valid word--process it
L      = p1 - p - 1
target = ( L << 8 ) + p.0``
pF1    = @gWordArrayF1( 0 )

for F1item = 0 to gNoInIndex'Search for match in gWordArrayF1
long if target = pF1.0%' Len & 1st char match

while L > 1
if | p + L | <> | pF1 + L | then exit "nextWord"
L --
wend

gMatchCount++' Words match
gMatchedItems(gMatchCount) = F1item
end if
"nextWord"
pF1 += sizeOf( gWordArrayF1( 0 ) )' Go to next word in array
next

xelse' Not a valid char

for p = p1 + 1 to pEnd' Bad word--move to next
if p.0`` == delim then exit for
next

end if
end if
next p1

// ??? Can this go before the loop ???
//IF SYSTEM(_sysVers) => 1000 THEN FN KillSpinningCursor
p = fn skipToEndOfLine( p )'error

next p

gDataReady = _True
compress dynamic gMatchedItems

end fn
'------------------------------------------------------