Hi, I am not sure if this is the right place to post this, but couldn't find any algorithms section or something similar...

I am doing the Six Degrees of Kevin Bacon implementation.

For those who do not know, the problem states that all actors/actresses are related to Kevin Bacon by appearing directly in a movie with him, or by being in a movie with someone who has been with Bacon, or someone who has been with someone who has been with someone who has been... etc with Bacon.

"The computation of a Bacon number for X is a "shortest path" algorithm. It involves computing the Bacon number for all film actors who X has acted with:

If X has appeared in a movie with Bacon, X's Bacon number is 1

If X does not have a Bacon number of 1 (hasn't appeared in a Bacon movie) but has appeared in a movie with someone who has, X's Bacon Number is 2

If X does not have a Bacon number of 2, the computation continues, examining all actors with whom X has been in a movie. The Bacon number of X is 1 plus the lowest Bacon number of all actors who have appeared in a movie with X. " (Wikipedia: http://en.wikipedia.org/wiki/Bacon_number)

For example, also from wikipedia:

Pope John Paul II (Bacon number: 3)

Pope John Paul II was in Padre Pio — Tra cielo e terra (2000) with Giovanni Lombardo Radice.

Giovanni Lombardo Radice was in The Omen (2006) with Vee Vimolmal

Vee Vimolmal was in Where the Truth Lies (2005) with Kevin Bacon

I hope the problem is understood... I am given a text file with lots of movies and their actors. The file contains thousands of lines like this one:

Name of the movie (1970): Bacon, Kevin/ Alley, Kirstie/ Another, actor/ Another, actress/...... and so on...

Name of movie 2 (1969): More, actors/ More, actresses/.... and so on

So I thought about doing it with a BFS search... But I don't know how to use data structures for this problem... Also, I cannot figure the problem to use the BFS search, I suppose each actor is a vertex, but how? I mean, I just can't draw the approach in my head... If someone can give me some light here, I would really appreciate!!! I am really lost, have been thinking about this for like a week but... well hopefully someone puts me on the road...

Thanks a lot! Thanks for reading this!!

I am doing the Six Degrees of Kevin Bacon implementation.

For those who do not know, the problem states that all actors/actresses are related to Kevin Bacon by appearing directly in a movie with him, or by being in a movie with someone who has been with Bacon, or someone who has been with someone who has been with someone who has been... etc with Bacon.

"The computation of a Bacon number for X is a "shortest path" algorithm. It involves computing the Bacon number for all film actors who X has acted with:

If X has appeared in a movie with Bacon, X's Bacon number is 1

If X does not have a Bacon number of 1 (hasn't appeared in a Bacon movie) but has appeared in a movie with someone who has, X's Bacon Number is 2

If X does not have a Bacon number of 2, the computation continues, examining all actors with whom X has been in a movie. The Bacon number of X is 1 plus the lowest Bacon number of all actors who have appeared in a movie with X. " (Wikipedia: http://en.wikipedia.org/wiki/Bacon_number)

For example, also from wikipedia:

Pope John Paul II (Bacon number: 3)

Pope John Paul II was in Padre Pio — Tra cielo e terra (2000) with Giovanni Lombardo Radice.

Giovanni Lombardo Radice was in The Omen (2006) with Vee Vimolmal

Vee Vimolmal was in Where the Truth Lies (2005) with Kevin Bacon

I hope the problem is understood... I am given a text file with lots of movies and their actors. The file contains thousands of lines like this one:

Name of the movie (1970): Bacon, Kevin/ Alley, Kirstie/ Another, actor/ Another, actress/...... and so on...

Name of movie 2 (1969): More, actors/ More, actresses/.... and so on

So I thought about doing it with a BFS search... But I don't know how to use data structures for this problem... Also, I cannot figure the problem to use the BFS search, I suppose each actor is a vertex, but how? I mean, I just can't draw the approach in my head... If someone can give me some light here, I would really appreciate!!! I am really lost, have been thinking about this for like a week but... well hopefully someone puts me on the road...

Thanks a lot! Thanks for reading this!!

Please, could you post the file?

In case you are wondering, Snake, the contents of the file and how they are arranged are vital in picking the best suited algorithm.

Sounds a bit like homework, but at least a pretty interesting topic for once :)

Thanks! :) This is a compact version of the file, the real one is like 5 MB, and then there is another one which is like 20 MB... I post the small one since it is the same as the other ones but with less movies...

This reminds me of the classic 'Travelling Salesman' problem, or even the 'Dining Philosophers' problem.

This reminds me of the classic 'Travelling Salesman' problem, or even the 'Dining Philosophers' problem.

How does it relate to the dining philosophers' problem? I don't see any relation at all :s

Any ideas?

How about something like:

A bit of optimization would be to have two separate dd arrays for MovieBacon and ActorBacon. (bacon!=0 means "visited" already)

Haven't tested it, and can't guarantee the idea is ok ^^"

MOVIE struct

MovieID dd ?

MovieName db 32 dup (?)

NumActors dd ?

BaconNumber dd ?

Actors dd 0 dup (?) ; variable-size! , is an array of ActorID's

MOVIE ends

ACTOR struct

ActorID dd ?

ActorName db 32 dup (?)

NumMovies dd ?

BaconNumber dd ?

Movies dd 0 dup (?) ; variable-size! , is an array of MovieID's

ACTOR ends

AllMovies dd 0 ; ptr to the MOVIE array (pointer to pointer)

AllActors dd 0 ; ptr to the ACTOR array (pointer to pointer)

NumMovies dd 0

NumActors dd 0

CurBacon dd 0

InitArrays proc

;-----------------------\

; here, you'll be constantly increasing the size of the AllMovies and AllActors arrays,

; and allocating+resizing the MOVIE and ACTOR structs, when a new actor/movie is added.

;...

;-----------------------/

ret

InitArrays endp

FindBaconNumbersFor proc ActorID

;------[ for each MOVIE and ACTOR, set BaconNumber to 0 ]------[

;...

;--------------------------------------------------------------/

;---[ global registers! ]---\

xor edx,edx ; CurBacon

mov esi,AllMovies

mov edi,AllActors

;---------------------------/

invoke EnumMovies,ActorID

ret

FindBaconNumbersFor endp

EnumMovies proc uses ebx ecx ; ActorID is in EAX

mov ebx, ; ebx = AllActors

inc edx

cmp .ACTOR.BaconNumber,0

mov ecx,.ACTOR.NumMovies

jne _done

mov .ACTOR.BaconNumber,edx

add ebx,ACTOR ; now ebx becomes the address of the Movies[] array

_next:

dec ecx

jl _done

mov eax,

call EnumActors ; can be optimized with a "push _next | jmp EnumActors"

jmp _next

_done:

dec edx

ret

EnumMovies endp

EnumActors proc uses ebx ecx ; MovieID is in EAX

mov ebx, ; ebx = AllMovies

cmp .MOVIE.BaconNumber,0

mov ecx,.MOVIE.NumActors

jne _done

mov .MOVIE.BaconNumber,edx

add ebx,MOVIE ; now ebx becomes the address of the Movies[] array

_next:

dec ecx

jl _done

mov eax,

call EnumMovies ; can be optimized with a "push _next | jmp EnumMovies"

jmp _next

_done:

ret

EnumActors endp

A bit of optimization would be to have two separate dd arrays for MovieBacon and ActorBacon. (bacon!=0 means "visited" already)

Haven't tested it, and can't guarantee the idea is ok ^^"

P.S. A slight error in the code:

initially, set everything's BaconNumber to 2,000,000,000

and modify at both places

initially, set everything's BaconNumber to 2,000,000,000

and modify at both places

cmp .MOVIE.BaconNumber,0

mov ecx,.MOVIE.NumActors

jne _done

into

cmp .MOVIE.BaconNumber,edx

mov ecx,.MOVIE.NumActors

jle _done

Looks good, however, would that be efficient on a big file? Looks like it would take a while, but I am not sure though :p

A big file? The whole idea, as I've understood, is to compute the Bacon number of _every_ actor. So, it's necessary/best to put the necessary data into RAM. Let's assume you have the data of 1 million actors, 200000 movies, 15 actors average per movie, 20 movies average per actor. Let's remove the unnecessary MovieID and MovieName from the Movie struct and their respective members in the Actor struct, and move the BaconNumber outside of the structs. Total of 106,400,000 bytes RAM. Plenty of space to waste to reach the 2GB limit.

My idea is similar to the Bellman-Ford algorithm, but using the optimization-shortcut of having "Movies" present groups of edges. Because otherwise you'd have to make 1 graph, where actors are vertices, and each vertex is connected to ALL actors from ALL movies the current actor has played in. In other words, when an actor-vertex has 300 (20*15) edges, we needn't iterate/check them all again (in the case of them having been already visited with a shorter/same path-length): we just iterate 20 times. For even further optimization, maybe we can add some marking of visited/next-to-visit Movies, to do the more efficient breadth-first enumeration. But I guess that's already some fine-tuning to do after you have converted the current data into suitable MovieID/ActorID/Movies[]/Actors[] pure-integer data, and checked the speed of the current ideas.

My idea is similar to the Bellman-Ford algorithm, but using the optimization-shortcut of having "Movies" present groups of edges. Because otherwise you'd have to make 1 graph, where actors are vertices, and each vertex is connected to ALL actors from ALL movies the current actor has played in. In other words, when an actor-vertex has 300 (20*15) edges, we needn't iterate/check them all again (in the case of them having been already visited with a shorter/same path-length): we just iterate 20 times. For even further optimization, maybe we can add some marking of visited/next-to-visit Movies, to do the more efficient breadth-first enumeration. But I guess that's already some fine-tuning to do after you have converted the current data into suitable MovieID/ActorID/Movies[]/Actors[] pure-integer data, and checked the speed of the current ideas.

Thanks very much Ultrano... I'll see how that goes... I still find confusing your two procedures EnumMovies and EnumActors and the way they find the bacon number... I would greatly appreciate if you gave some hints in here :p

InitArrays proc

;-----------------------\

; here, you'll be constantly increasing the size of the AllMovies and AllActors arrays,

; and allocating+resizing the MOVIE and ACTOR structs, when a new actor/movie is added.

;...

;-----------------------/

ret

InitArrays endp

I've made a set of routines for the above task.

When you need additional memory: invoke AdjustMemory, ptr MEMORY.

AdjustMemory will allocate more mem (if needed) and return

the pointer to current free mem in EAX and the pointer in ECX to...hmm, read the docs;)

AdjustMemory is designed for fixed-size structures and so, but can be used here, as well.

example:

movies MEMORY <100* sMOVIE, 5* sMOVIE>

;when adding a new MOVIE

invoke AdjustMemory, o movies

assume eax:pMOVIE, ecx:ptr POINTER

add .sMOVIE

mov .BaconNumber, 0

.

.

.

assume eax:nothing, ecx:nothing

You can get it from www.beotel.net/~astancic (click on "Memory.zip")

(Comments are in quick and poor english, sorry.)

what? bread and bacon? :P

:D

ok, i'll leave now...

:D

ok, i'll leave now...