Robert,
I havent seen your code but I have a suggestion. I have a program that does
something very similar to yours, using a similar concept.
My program was not intended for the challenge.
When using a hash table you'll want do an insertion sort. Its going to be
way faster than any other sort including a quicksort. This is assuming that
you are your data is placed in order by hashvalue, not by string. It will
still be faster even if you use the string as your sort key because the
array will be sorted from the start.
You'd do the sorting by finding the insertion point and blockmoving all
array reference below it down one level (push it down) something like this
blkToMove = @myArray(index) - @myArray(gLastEntry)
blockmove @myArray(index),@myArray(index +1),blkToMove
then filling in the values in the newly freed array index.
I beat the SNOT (thats a technical term) out of out of every sort algorithm
that was available to me including Quicksort.
Also, you can preflight the search by diving the list in half (numerically)
and test whether you should begin to search and store the new value in the
upper or lower half. Doing anythig more than an uppper/lower half test
wastes too much time. A binary search of an already sorted array is much
slower than a preflight test. By the time you've done your binarch search
you could have already begun inserting.
This isnt speculation, I've done recently in a program.
Currently my program, not intended for the same purpose but achieves much
the same results STORES roughly 69000 (not 6900) unique words in 152 seconds
on a G3 233 in PPC mode. It does not count although that functionality
could be easily added.
I dont know if it was explained in the rules but I would think that any word
differing in case should be considered a new word
JACK, JaCK, JaCK, JaCk
should all be considered separate words. This puts additional stain on the
program and the algorithm. I think its too easy allow a word to be the same
whether a lookup table is used or CALL UPPERTEXT.
BTW,
If ANYONE figures out how to generate a hash value that is numerically
equivalent to its alpha counterpart please let me know. What I mean is that
the hash value when sorted will also sort the alpha key in the same order.
I dont know if such a thing exists but I've been trying to figure it out.
W.
-----Original Message-----
From: Robert Purves [mailto:robert.purves@...]
Sent: Monday, September 18, 2000 07:15
To: futurebasic@...
Subject: Re: Re: [FB] Benchmark Challenge #3 - WFC
>' '''''''' john's joh'ns 'john's 'jack 'jack' 'jack's
>Would give:
>JACK (2)
>JACK'S (1)
>JOHN'S (2)
>JOH'NS (1)
>
>The intent is to prevent "Jack's" from becoming "JACKS" or "JACK" and "S"
>- not to complicate life unnecessarily!
>The input file will be normal English ASCII text; specifically, some
>document I find sitting around on my hard drive (although others may use
>other documents, of course). It is safe to assume that there will be far
>less than 8000 different words, at least in the "limited to 32K" case.
After the above clarifications, I have posted a solution at
<ftp://ftp.futurebasic.org/futurebasic/dropbox/Challenge_3_Robert_P.sit>
The download is 12K.
My method is:
1. Scan the text, character by character, converting each character to
upper case with a lookup table. During the scan, collect unique words only
(rejecting duplicates but keeping count), via a hash table constructed so
that it also functions as the word list for step 2.
2. Sort the word list by Shell's method.
3. Append the word count to each line, storing all this result-text in a
buffer.
4. Dump the entire buffer into an edit field in one hit.
This is all plain vanilla FB^3 code. No assembler, and the only Toolbox
calls used in implementing the method are HLock, HUnlock, BlockMove and
TESetText.
On my iMac, a range of 32000-byte English text files take an absurdly small
5-6 ticks each to process; the spinning beach ball cursor is hard to see in
that time. If you are going to be timing rival entries, you would be
advised to use a slower Mac for more precision in the tick value.
Robert P.
--
To unsubscribe, send ANY message to <futurebasic-unsubscribe@...>