Hi, I am not sure if this is the right place to post this, but couldn't find any algorithms section or something similar...

I am doing the Six Degrees of Kevin Bacon implementation.

For those who do not know, the problem states that all actors/actresses are related to Kevin Bacon by appearing directly in a movie with him, or by being in a movie with someone who has been with Bacon, or someone who has been with someone who has been with someone who has been... etc with Bacon.

"The computation of a Bacon number for X is a "shortest path" algorithm. It involves computing the Bacon number for all film actors who X has acted with:

If X has appeared in a movie with Bacon, X's Bacon number is 1
If X does not have a Bacon number of 1 (hasn't appeared in a Bacon movie) but has appeared in a movie with someone who has, X's Bacon Number is 2
If X does not have a Bacon number of 2, the computation continues, examining all actors with whom X has been in a movie. The Bacon number of X is 1 plus the lowest Bacon number of all actors who have appeared in a movie with X. " (Wikipedia: http://en.wikipedia.org/wiki/Bacon_number)

For example, also from wikipedia:

Pope John Paul II (Bacon number: 3)

Pope John Paul II was in Padre Pio Tra cielo e terra (2000) with Giovanni Lombardo Radice.
Giovanni Lombardo Radice was in The Omen (2006) with Vee Vimolmal
Vee Vimolmal was in Where the Truth Lies (2005) with Kevin Bacon

I hope the problem is understood...  I am given a text file with lots of movies and their actors. The file contains thousands of lines like this one:

Name of the movie (1970): Bacon, Kevin/ Alley, Kirstie/ Another, actor/ Another, actress/...... and so on...

Name of movie 2 (1969): More, actors/ More, actresses/.... and so on

So I thought about doing it with a BFS search... But I don't know how to use data structures for this problem... Also, I cannot figure the problem to use the BFS search, I suppose each actor is a vertex, but how? I mean, I just can't draw the approach in my head... If someone can give me some light here, I would really appreciate!!! I am really lost, have been thinking about this for like a week but... well hopefully someone puts me on the road...

Thanks a lot! Thanks for reading this!!



Posted on 2007-08-04 19:34:38 by Snake


Please, could you post the file?



In case you are wondering, Snake, the contents of the file and how they are arranged are vital in picking the best suited algorithm.
Posted on 2007-08-05 10:28:56 by SpooK
Sounds a bit like homework, but at least a pretty interesting topic for once :)
Posted on 2007-08-05 11:19:44 by f0dder
Thanks! :) This is a compact version of the file, the real one is like 5 MB, and then there is another one which is like 20 MB... I post the small one since it is the same as the other ones but with less movies...

Posted on 2007-08-05 14:27:45 by Snake
This reminds me of the classic 'Travelling Salesman' problem, or even the 'Dining Philosophers' problem.
Posted on 2007-08-05 22:19:44 by Homer

This reminds me of the classic 'Travelling Salesman' problem, or even the 'Dining Philosophers' problem.



How does it relate to the dining philosophers' problem? I don't see any relation at all :s
Posted on 2007-08-05 22:29:24 by Snake
Any ideas?
Posted on 2007-08-07 11:09:40 by Snake
How about something like:

MOVIE struct
MovieID dd ?
MovieName db 32 dup (?)
NumActors dd ?
BaconNumber dd ?
Actors dd 0 dup (?) ; variable-size! , is an array of ActorID's
MOVIE ends

ACTOR struct
ActorID dd ?
ActorName db 32 dup (?)
NumMovies dd ?
BaconNumber dd ?
Movies dd 0 dup (?) ; variable-size! , is an array of MovieID's
ACTOR ends


AllMovies dd 0 ; ptr to the MOVIE array (pointer to pointer)
AllActors dd 0 ; ptr to the ACTOR array (pointer to pointer)
NumMovies dd 0
NumActors dd 0
CurBacon  dd 0



InitArrays proc
;-----------------------\
; here, you'll be constantly increasing the size of the AllMovies and AllActors arrays,
; and allocating+resizing the MOVIE and ACTOR structs, when a new actor/movie is added.
;...
;-----------------------/
ret
InitArrays endp


FindBaconNumbersFor proc ActorID
;------[ for each MOVIE and ACTOR, set BaconNumber to 0 ]------[
;...
;--------------------------------------------------------------/
;---[ global registers! ]---\
xor edx,edx ; CurBacon
mov esi,AllMovies
mov edi,AllActors
;---------------------------/

invoke EnumMovies,ActorID
ret
FindBaconNumbersFor endp

EnumMovies proc uses ebx ecx ; ActorID is in EAX
mov ebx, ; ebx = AllActors
inc edx
cmp .ACTOR.BaconNumber,0
mov ecx,.ACTOR.NumMovies
jne _done
mov .ACTOR.BaconNumber,edx
add ebx,ACTOR ; now ebx becomes the address of the Movies[] array
_next:
dec ecx
jl _done
mov eax,
call EnumActors ; can be optimized with a "push _next | jmp EnumActors"
jmp _next
_done:
dec edx
ret
EnumMovies endp

EnumActors proc uses ebx ecx ; MovieID is in EAX
mov ebx, ; ebx = AllMovies
cmp .MOVIE.BaconNumber,0
mov ecx,.MOVIE.NumActors
jne _done
mov .MOVIE.BaconNumber,edx
add ebx,MOVIE ; now ebx becomes the address of the Movies[] array
_next:
dec ecx
jl _done
mov eax,
call EnumMovies ; can be optimized with a "push _next | jmp EnumMovies"
jmp _next
_done:
ret
EnumActors endp


A bit of optimization would be to have two separate dd arrays for MovieBacon and ActorBacon. (bacon!=0 means "visited" already)

Haven't tested it, and can't guarantee the idea is ok ^^"
Posted on 2007-08-07 12:35:45 by Ultrano
P.S. A slight error in the code:

initially, set everything's BaconNumber to 2,000,000,000
and modify at both places


cmp .MOVIE.BaconNumber,0
mov ecx,.MOVIE.NumActors
jne _done

into

cmp .MOVIE.BaconNumber,edx
mov ecx,.MOVIE.NumActors
jle _done
Posted on 2007-08-09 09:50:03 by Ultrano
Looks good, however, would that be efficient on a big file? Looks like it would take a while, but I am not sure though :p
Posted on 2007-08-09 11:23:45 by Snake
A big file? The whole idea, as I've understood, is to compute the Bacon number of _every_ actor. So, it's necessary/best to put the necessary data into RAM. Let's assume you have the data of 1 million actors, 200000 movies, 15 actors average per movie, 20 movies average per actor. Let's remove the unnecessary MovieID and MovieName from the Movie struct and their respective members in the Actor struct, and move the BaconNumber outside of the structs. Total of 106,400,000 bytes RAM. Plenty of space to waste to reach the 2GB limit.

My idea is similar to the Bellman-Ford algorithm, but using the optimization-shortcut of having "Movies" present groups of edges. Because otherwise you'd have to make 1 graph, where actors are vertices, and each vertex is connected to ALL actors from ALL movies the current actor has played in. In other words, when an actor-vertex has 300 (20*15) edges, we needn't iterate/check them all again (in the case of them having been already visited with a shorter/same path-length): we just iterate 20 times. For even further optimization, maybe we can add some  marking of visited/next-to-visit Movies, to do the more efficient breadth-first enumeration. But I guess that's already some fine-tuning to do after you have converted the current data into suitable MovieID/ActorID/Movies[]/Actors[] pure-integer data, and checked the speed of the current ideas.
Posted on 2007-08-09 11:51:25 by Ultrano
Thanks very much Ultrano... I'll see how that goes... I still find confusing your two procedures EnumMovies and EnumActors and the way they find the bacon number... I would greatly appreciate if you gave some hints in here :p
Posted on 2007-08-10 23:33:00 by Snake


InitArrays proc
;-----------------------\
; here, you'll be constantly increasing the size of the AllMovies and AllActors arrays,
; and allocating+resizing the MOVIE and ACTOR structs, when a new actor/movie is added.
;...
;-----------------------/
ret
InitArrays endp



I've made a set of routines for the above task.

When you need additional memory: invoke AdjustMemory, ptr MEMORY.

AdjustMemory will allocate more mem (if needed) and return
the pointer to current free mem in EAX and the pointer in ECX to...hmm, read the docs;)

AdjustMemory is designed for fixed-size structures and so, but can be used here, as well.

example:

movies MEMORY <100* sMOVIE, 5* sMOVIE>

;when adding a new MOVIE
invoke AdjustMemory, o movies

assume eax:pMOVIE, ecx:ptr POINTER
add .sMOVIE

mov .BaconNumber, 0
.
.
.
assume eax:nothing, ecx:nothing

You can get it from www.beotel.net/~astancic (click on "Memory.zip")
(Comments are in quick and poor english, sorry.)
Posted on 2007-08-11 02:04:27 by aleksaZR
what? bread and bacon?  :P
:D

ok, i'll leave now...
Posted on 2007-08-14 07:05:48 by HeLLoWorld