![]() ![]() |
Aug 29 2003, 02:02 PM
Post
#1
|
|||||
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1,801 Joined: 15-April 03 From: London, Ontario, Canada Member No.: 14 |
I know this topic has come up before, but I'd like to re-open the discussion of the best ways to rapidly compare two datasets to distinguish the differences between them... The example that springs most readily to mind (probably because it's what I'm working on), is with the DSWorld dataset. I want to be able to take "snapshots" of the decal environment at intervals, and be able to discern (instantly?) what has changed in the interval... Currently I have an involved code-structure configured as follows: Initialisation:
During Execution:
As you can see, this is pretty complicated, and involves a lot of code, which, subsequently, takes a proportionately long time to execute. This correspondingly reduces the actual accuracy of the resulting data. I'm looking for a better way... Any insight or suggestions? -Triane -------------------- SQL>SELECT * FROM users WHERE clue > 0;
0 rows returned SQL>_ |
||||
|
|
|||||
Aug 29 2003, 03:35 PM
Post
#2
|
|||||
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 3,409 Joined: 12-April 03 Member No.: 6 |
Triane, I did it pretty much like this and honestly it doesn't take that long. I did it in my last seen part of my macro and it updates every 3 secs (snapshots) and really with 40-60 people it took hardly anytime at all on a 700mhz classic Athlon. I set it for 3 sec because I felt thats good enough but it could be squeezed down to 1 sec and then it would do nothing else but that task. I think we may have done some things a little different though (mine was as tweaked as I could get it at the time). This is the code and it might not make sense out of context but here goes When the macro first fires up: Initialization
During Execution
The dataset whoisaroundme is this Dataset whoisaroundme Name = String 50 End I hope you can make heads or tails out of this cause it took me about 24 hours to tweak it to a point you can't see the lag. The only way I can see it being faster is as a plugin. |
||||
|
|
|||||
Aug 29 2003, 03:50 PM
Post
#3
|
|
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 3,409 Joined: 12-April 03 Member No.: 6 |
Oh, and watch out for ghosts cause the worldfilter will rip you a new one (and reason I have to distance you). LoadDecalWorld DISTANCE will give you ghosts too so be careful.
|
|
|
|
Aug 29 2003, 03:56 PM
Post
#4
|
|
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Moderator Posts: 3,366 Joined: 12-April 03 From: Vancouver, Canada Member No.: 5 |
Efficiency wise, you can tweak this to perform less dataset lookups if you slap an ascending index on the GUID field in both of them.
You then grab the 1st record from each DS and run this algorithm until EOF in both datasets. If Backup[GUID] = Current[GUID] // Nothing to do - DSNext in both of them Else If Backup[GUID] < Current[GUID] // Object was there, now it's not. Add to LeftScene then DSNext in Before Else // Object wasn't there, not it is. Add to NewOnScene and DSNext in After End End Of course, you could request that Cam throw together a SQL parser for datasets, so you could express this as: SELECT Before.*, After.* FROM Before FULL OUTER JOIN After on Before.GUID = After.GUID j/k -------------------- Please read the Forum Posting Guidelines AND the FAQ Forum before posting.
Don't waste your time posting a bug report until you've read FAQ: How do I report a bug ? |
|
|
|
Aug 29 2003, 04:07 PM
Post
#5
|
|
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 3,409 Joined: 12-April 03 Member No.: 6 |
I sorta tried it like that at first but not until I came up with how I finally HAD to do it di it work right. A lot of it is because of ghosts and stuff. Might re look at it again but after 24 hours back then I said fook it it works, its fast enough to do what I want so leave it be.
See the big HUGE, nay, ENORMOUS problem is that I would do my compares thinking, okay OLDDATASET is 10 long, NEW IS 10 LONG now scan each. PROBLEM was that 10 could be 9 people that were here and aren't now and 1 new person or any combo. If the world filter actually updated right then the scheme could be done in one very simple procedure. If Backup[GUID] = Current[GUID] // Nothing to do - DSNext in both of them Else This next line makes no sense to me because your GUID might be greater than mine but it could be less (in which case this would fire). Can you explain this a bit more? If Backup[GUID] < Current[GUID] // Object was there, now it's not. Add to LeftScene then DSNext in Before Else // Object wasn't there, not it is. Add to NewOnScene and DSNext in After End End In that simple routine of yours it makes no allowances for Ghosts or false echoes that you will get all the time. As I said my simple routine started out very similiar to this but with all the false echoes I got I had to tweak and tweak and twe... |
|
|
|
Aug 29 2003, 04:59 PM
Post
#6
|
|
|
Insane Poster ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 241 Joined: 15-April 03 Member No.: 13 |
perhaps this example will help explain how the algorithm Ipa posted works:
take two decks of cards, and randomly remove 20 cards from each deck. now, what would be a good way to find out which cards are present only in the *first* stack, which cards are present only in the *second* stack, and which cards are present in *both* stacks, if you're only allowed to look at two cards at a time (as a computer does)? first, sort each stack of 20 cards. aces through kings, grouped by spades then hearts then diamonds then clubs. place the two sorted stacks side by side, with the lowest card facing up. now, look at the lowest card of each stack. if both cards are the same, take both of them and put them in the 'both' pile. if the card in the left stack is lower, remove only the left card and put it in the 'left stack only' pile. if the card in the right stack is lower, remove only the right card and put it in the 'right stack only' pile. repeat the preceding paragraph until there are no cards left. using this method, you can find out what's the same and what's different between two stacks of cards (or sets of objects) by doing only one comparison for each card you sort, and each comparison is guaranteed to properly sort at least one card, possibly two. for 20 cards, that's between 20 and 39 comparisons, depending on the particular cards you got. if the sets are mostly the same, the average number of comparisons will be near 20. the alternative is to take an arbitrary card from the first stack, one at a time, then compare it to every card in the second stack to see if it matches any of them. in the worst case, that's 400 comparisons. if the sets are mostly the same, the average will still be over 100. For a dataset containing 60 objects, the difference is even more extreme. This is because Ipa's algorithm grows linearly (so that 3 times the number of items takes 3 times as long). The alternative algorithm grows quadratically (n^2) (so that 3 times the number of items takes 9 times as long). -ken |
|
|
|
Aug 29 2003, 05:12 PM
Post
#7
|
|||||
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Moderator Posts: 3,366 Joined: 12-April 03 From: Vancouver, Canada Member No.: 5 |
I was commenting on the algorithm in general. Given 2 sets of data that have a field that uniquely identifies each record, you can efficiently determine differences by sorting on the unique field and comparing 1 record at a time from each set. Using a mathematical greater than/less than operation on an AC GUID is obviously meaningless. However, in the context of 2 datasets sorted by GUID, it provides the means to determine when a GUID has been added or removed from either set.
After dealing with GUID's "2222" in each set, you'll fetch next from both sets. Due to the same sort order on both sides, when you hit the condition of 3333 being less than 4444, you'll know a record has disappeared. You then keep fetching from the "Backup" side until you have a match again, or find a "hole" on the other side. Again, just a general purpose algorithm that I presented as being somewhat more efficient than looping each set and doing a DSLocate into the other. [Edit] Not only did Ken beat me to it, but I like the 2 decks of cards analogy better -------------------- Please read the Forum Posting Guidelines AND the FAQ Forum before posting.
Don't waste your time posting a bug report until you've read FAQ: How do I report a bug ? |
||||
|
|
|||||
Aug 29 2003, 05:13 PM
Post
#8
|
|
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 3,409 Joined: 12-April 03 Member No.: 6 |
Ahhh, okay Ipa.
Ken, like a bubble sort? |
|
|
|
Aug 29 2003, 05:22 PM
Post
#9
|
|||
|
Insane Poster ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 241 Joined: 15-April 03 Member No.: 13 |
I'm not sure exactly what you're referring to, but yes, a bubble sort is another example of an n^2 algorithm that does far more comparisons than are necessary to get the job done. If you're familiar with merge sorting, it's a very similar idea, except that in this case, we're interested in comparing two sorted lists, rather than merging them together into a single sorted list. -ken |
||
|
|
|||
Aug 29 2003, 07:47 PM
Post
#10
|
|
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 3,409 Joined: 12-April 03 Member No.: 6 |
Ipa, what happens when one set is smaller than the other?
1111 1111 2222 2222 3333 4444 4444 5555 5555 EOF EOF Where your blank is at wouldn't be a blank but all the remaing data would have been shifted up one notch. Therefore, if I am not mistaken, from that point forward they would all be <, right? |
|
|
|
Aug 29 2003, 09:56 PM
Post
#11
|
|||||||
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1,801 Joined: 15-April 03 From: London, Ontario, Canada Member No.: 14 |
Pardon me a moment, I'm thinking out loud First "snapshot" {Backup}:
Second "snapshot": (player number 222222 arrives, player 555555 leaves, player 777777 also arrives) {Current}
Leaving us with the following sorted datasets:
So does the logic of "If Backup[GUID] < Current[GUID]" et al still apply? Because (for example) 333333 and 44444 exist in both lists, but won't be compared against themselves (to see if they're equal), and will cause all subsequent tests to be made inaccurately.... right? -Triane -------------------- SQL>SELECT * FROM users WHERE clue > 0;
0 rows returned SQL>_ |
||||||
|
|
|||||||
Aug 30 2003, 12:07 AM
Post
#12
|
|||
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Moderator Posts: 3,366 Joined: 12-April 03 From: Vancouver, Canada Member No.: 5 |
DaMob: Yes, the pseudocode is simplistic and doesn't specifically deal with EOF. If you hit EOF in one set, you automatically know that the remainder in the other set are unmatched, so you need to add an Or condition to the comparisons to check EOF. Triane: Re-reading my example, I had some typos (I had originally named the example sets 'Before' & 'After" then decided to rename them to match your code, but didn't globally rename in the sample pseudo-code). Re-worded:
When you find a mis-match, you only DSNext on 1 of the sets. In your example, Backup = 333333 : Current = 222222 That's the "Else" block above: add to "NewOnScene" (222222 is new on the scene) and move forward in CURRENT (but not in BACKUP). Both sets are now at 333333 - nothing to do. Move both forward by 1. etc. Expressing the sets side-by-side may not have been the smartest way to get the concept across, because it implies you're matching the rows when you're not. Each "Column" of Backup & Current should be seen as a separate stack, moving independently. That's why I like Ken's real-world example with the 2 decks of cards, it gets the picture across more clearly. -------------------- Please read the Forum Posting Guidelines AND the FAQ Forum before posting.
Don't waste your time posting a bug report until you've read FAQ: How do I report a bug ? |
||
|
|
|||
Aug 30 2003, 01:18 AM
Post
#13
|
|||
![]() Lord of the Forums ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 3,409 Joined: 12-April 03 Member No.: 6 |
Something like this (haven't tested this)?
|
||
|
|
|||
Sep 2 2003, 09:11 PM
Post
#14
|
|
|
Cool Newbie ![]() ![]() Group: Members Posts: 22 Joined: 14-July 03 Member No.: 1,155 |
Man, I got lost three lines into this stuff.
I don't suppose any kind sould would mind doing an even more detailed explanation of what the most important lines of code are doing? Kind of a "Dataset for Dummies" primer? |
|
|
|
Sep 3 2003, 03:01 PM
Post
#15
|
|||||||||||||||||
|
Insane Poster ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 241 Joined: 15-April 03 Member No.: 13 |
perhaps it would help if you arrange your 'sample' datasets this way:
111 = 111, so proceed to the next item in both datasets:
222 = 222, so proceed to the next item in both datasets:
333 < 444, so mark '333' as 'left scene', and move to the next item in 'old' only:
444 = 444, so proceed to the next item in both datasets:
555 = 555, so proceed to the next item in both datasets:
777 > 666, so mark '666' as 'newly arrived', and move to the next item in 'new' only:
777 = 777, so proceed to the next item in both datasets:
old is empty, so everything left in new must be 'newly arrived'. if new had been empty, everything left in old would have been 'left scene'. -ken |
||||||||||||||||
|
|
|||||||||||||||||
![]() ![]() |
| Lo-Fi Version | Time is now: 3rd September 2010 - 07:28 AM |