I'm devoting this thread to my new p2p protocol.
It borrows heavily from two existing protocols which are Bit Torrent and Mute .. the former is insecure and fast, while the latter is secure and slow.
HP2P is my humble offering to fill the gap, offering imho the best of both worlds.

Mute was the world's first decentralized p2p protocol to use "Ant Routing".
It was modelled after the behaviour of ant colonies in nature.
Ants leave a "pheromone trail" wherever they go. The wandering of ants is controlled partly by randomness and partly by their attraction to existing pheremone trails. Before too long, strong trails are formed between the ant nest and one or more "goals" (for example, food).
"Ant Routed" networks rely heavily on relaying.
Within an "Ant Routed" network, each Peer User knows only the IP addresses of their immediate Peers. Communication is achieved not via IP address but via a randomly generated "Virtual User ID". Let's abstract the notion of a peer session by imagining N DOORS to other Users. Beyond each "Door" lays a potentially infinite number of Virtual Users.
When incoming packets arrive via one of these Doors, we make a note of the Virtual User ID of the Sender, and associate it with that Door (we also record a "Hop Score" for that VUI - more on that later).
Should subsequent packets arrive destined for that user (or should we wish to send packets to that user), we select the Door leading to that User which has the lowest HopScore. In this fashion, interpeer routes are established which are efficient yet dynamic, and which do not rely on the ip addresses of the endpoints.
Now here is the gold in this rock.
Even though you know the ip addresses of your immediate peers (and they know yours), it's impossible to associate an ip address with a virtual address, there's no way of telling if a peer sent you a request, or if some user beyond them sent it and they merely relayed it. File transfers cannot be associated with ip addresses unless ALL YOUR PEERS belong to the attacker. More Doors = More Speed = More Security :)

I'm actually quite far into implementing this new protocol, and I will describe its packets in detail in subsequent postings, but I am very keen to hear any thoughts you may have.

Merry Christmas,
Homer.
Posted on 2004-12-24 07:03:33 by Homer
The HP2P protocol uses a standard PacketHeader which may or may not be followed by a PacketPayload. Here is a structure which describes the standard HP2P PacketHeader.



PacketHeader struct
Protocol dd ? ;<-- Always "HP2P" (similar to "YMSG" header)
PacketSize dd ? ;<-- Size of complete packet including Header
PacketType dd ? ;<-- Packet Type identifier (see below)
HopScore dd ?
CRC MD5HASH <?> ;<-- Packet CRC (used to identify corrupt and duplicated packets)
VID MD5HASH <?> ;<-- User Virtual ID is actually an md5 hash of some random values
PacketHeader ends


The first DWORD (offset 0) of any HP2P packet contains the four bytes "HP2P".
We can use this to identify malformed packets.
The next DWORD (offset 4) contains the FULL PacketSize, which is INCLUSIVE of the PacketHeader, ie PayloadSize + HeaderSize.
The next DWORD (offset 8 ) contains the PacketType Identifier, which describes what kind of HP2P packet this is.
The next DWORD (offset 12) contains the HopScore, which gives an indication as to the number of Hops which the packet has travelled since it was created.
The next field (offset 16) contains a CRC for the packet payload which can be used to further identify malformed packets. It is in reality an MD5 Hash of the packet payload, stored as four DWORDS.
The final field (offset 32) contains another MD5 Hash, this one is the Virtual User ID of the Sender - this is randomly generated by the Sender.
Only the Sender is capable of recognizing their own Virtual User ID, but cleverly, when the Sender sees its own VID in reply packets, it relays the packets onwards regardless (a further security measure to prevent corraling).

All the packets I will go on to describe have this PacketHeader in common.
Some of you might notice similarities between it and Yahoo protocols, this is not an accident, it's my personal experience with protocol headers.
Posted on 2004-12-24 08:00:33 by Homer
The following enumerated PacketTypes have been defined:
P2P_HELLOSERVER equ 0 ;handshake
P2P_HELLOCLIENT equ 1 ;handshake
P2P_STRINGSEARCH equ 2 ;search for string
P2P_FILEREPLY equ 3 ;reply to stringsearch with filehashes
P2P_HASHREQUEST equ 4 ;search for users with this file
P2P_HAVEPIECES equ 5 ;reply to hashrequest with piecemap
P2P_PIECEREQUEST equ 6 ;request for specific piece
P2P_PIECEREPLY equ 7 ;reply to piecerequest with data

Protocol handshaking is initiated by the Client sending the Server a "HelloServer" packet. This Packet consists of a PacketHeader with NO PAYLOAD, and its fields look like this:

;PacketHeader struct
Protocol dd "HP2P"
PacketSize dd sizeof PacketHeader
PacketType dd P2P_HELLOSERVER
HopScore dd Random (90,110)
CRC MD5HASH <HashOfThisHeader>
VID MD5HASH <NULL>
;PacketHeader ends

When the Server sees this packet, the Server replies with a "HelloClient" packet. As for its format, we merely change the PacketType and then echo the packet back to the Client. At this moment, as far as the Server is concerned, a Session has been established. When the Client receives this echoed Greeting packet, it too assumes that a Session is established.
Either party may now make requests over the existing Session. Since the server sends its Greeting reply before anything else, its safe for the Server to send queries immediately following the the Greeting reply, because the Client will handle packets in the order they were received.. thus it will see the Greeting Reply, then subsequent requests if any.
Posted on 2004-12-24 08:40:39 by Homer
If there is a central cache of all hosts then it may be possible for an attacker to connect to each host, request a file and see who responds with a 0 hop count.
Posted on 2004-12-24 15:44:45 by QuantumMatrix1024
Thanks for the first feedback posted :)

When the testbed servant app starts up, it connects to one of a number of webservers and fetches a pagefull of recent peer ip addresses via a php script.. in doing so, its own ip is logged (normally), and the serverside cache is rotated.
It is only possible to fetch a cache of the last (at the moment, 30) peers who connected to the network, so you could only map that many users using that information. Furthermore, when packets are created by a user, the user initializes the "initial ttl" ak "hopscore" to a semirandom value, in other words Jitter is added to the initial hopscore meaning that even immediate peers cannot use hopscores to map the network :)

More on the Protocol tommorow :)
Posted on 2004-12-26 07:39:48 by Homer
As mentioned previously, when a Packet is first created, it is stamped with a TTL value which is a ranged random.
When packets pass from peer to peer, their TTL is decremented until it reaches zero, in which case the Packet expires (is not relayed).

Let's look at the next few enumerated Packet Types, the StringSearch and HashReply, and the HashRequest..
StringSearch is a request for users sharing a file by name.
A StringSearch packet's Header carries the VirtualID of the user who made the query, which is used to direct responses back to them.
The Payload which follows the PacketHeader contains a ZeroTerminated Search String.
The String contains one or more Terms separated by spaces.
The Search Algorithm searches for various wildcard patterns based on the SearchTerm(s).
Should a Client find one or more matches in their SharedFiles, they eill respond with a HashReply packet containing one or more ResultEntries consisting each of a FileName/FileHash pair, where the Hash represents the MD5 hash of the (complete) file content.
We can think of a StringSearch as a means of associating FileNames with FileHashes. The resulting FileName/Hash pairs are enlisted in the SearchReplies control as clickable entries.
DoubleClicking on a SearchResult will cause our fifth packet type to be issued: the HashRequest. This is a request for users who share a specific file identified by its hash, you can think of it as a "request for all users who share a SPECIFIC file" to advertise their VirtualID..
More later :)
Posted on 2004-12-26 08:00:07 by Homer
Hi Homer
I look foreward to seeing this implamented.
Will you be adding instant messaging? It would be nice to have IMing using public/private key encryption (to thwart spying peers) but maybe this would be beyond the scope of a file sharing app.
I think a host should keep a cache of the virtual IDs who have downloaded peices from him that way when a host responds to a string or hash search he can respond with a list of hosts who have peices.
My only concern is how fast this can realy be. If a user is using most of his bandwidth to download then there will be little available for forwarding peices.



when packets are created by a user, the user initializes the "initial ttl" ak "hopscore" to a semirandom value, in other words Jitter is added to the initial hopscore meaning that even immediate peers cannot use hopscores to map the network :)

:)

good luck
Posted on 2004-12-26 18:04:14 by QuantumMatrix1024
One potential DoS attack is to flood the ip caches with non-client ip addresses, but you would need to attack all of the caches at once, and the attack would only be successful until the next legitimate client attempts to fetch a list of peers and is logged.. a higher number of seeder hosts makes this a practical impossibility, and if there were other potential sources of seeds the probability of a successful DoS attack approaches negligability.

Now back to the Protocol :)
Let's assume a new Client wishes to join the network.
Their application firstly visits one of a number of web-based providers of "real random numbers" and fetches a value to be used as Random Seed. This is used along with some local hardware data to generate the user's VirtualID. If the Client is not Firewalled, it will start a Server listening for incoming connection attempts. Next, the client will visit one of a number of Seed hosts and fetch a list of peer ip addresses, then one of its threads begins consuming those ip addresses by making outgoing connection attempts. Code is in place to ensure that for any two peer clients only a single session between them is established - regardless of who made the first successful connection to whom.
We abstract these interpeer sessions and think of them as "Doors" which lead to other peers. A Door is bidirectional - it doesn't matter who opens the Door, once it is opened traffic can flow in both directions. We "mark" our Doors with the UserID/HopScore pairs of users who lay beyond it.
This mechanism is used A) to route packets via the most apparently efficient path, and B) to reduce "packet loops" caused by feedback loops in the network topography (imagine three clients in a triangle formation).
Most of the packet types in the protocol are aimed towards a particular end-user via their VirtualID, but a couple of packet types support "broadcasting" (because no knowledge of a given resource has yet been acquired).. Broadcast packets are sent out via "all Doors with the exception of the Door the packet arrived through, and with only a 20% chance of being sent through Doors that lead to the Sender" (since the packet is marked with the VirtualID of its creator).
Imagine we have two Doors leading to Peer A and Peer B, and unbeknown to us, A and B are ALSO peers (triangle topography).
We send out a broadcast packet to peers A and B, A will relay it to B, and if we are unlucky due to timing, B will also relay it to A. The packet has an 80% chance of expiring on A and B (regardless of its TTL!) because the peers are only 20% inclined to send the packet back in your "direction" (any Door that leads to You).
This is a tweakable threshold value which probably should be part of a more complex formula that takes into account for example the ratio of Doors that do and do not lead "back to Sender", and the HopScores for each Door leading to Sender, this will need to be "felt out" in the real world to find a happy medium.

Next time I'll start talking about the FileTransfer mechanism which is based loosely on BitTorrent protocol.
Posted on 2004-12-26 18:22:31 by Homer
Open Relay Chat was implemented in an earlier version of the protocol and was used during early betatesting to identify network loops in what is inherently an unmappable environment.
Chat was removed from the protocol because unlike "conventional" filesharing protocols, HP2P sends both network messages AND filedata across the p2p network. Since this protocol is an attempt to address the (quite poor) transfer rates achievable under MUTE protocol, it seemed unreasonable to add the overhead of chat packets which would choke the network, reducing everyone's xfer rates. It's more reasonable to implement chat as a side-protocol which requires no real anonymity or simply use a separate chat. Besides that, it wouldn't be difficult to create a client which deliberately floods the network with spurious chat packets to disrupt the network and annoy as many users as possible, with all users potentially contributing to the ensuing storm.

In short, there are no plans to support chat in the current protocol.
Posted on 2004-12-26 18:51:28 by Homer
EvilHomer2k, I greatly like your idea and the way you're structuring it. I agree that implementation of chat protocol inside will make it more complicated than it should be and it will bug the peer network. I've got two questions - is there going to be implemented a ratio subsystem and are you planning on implementing another additional way of identifying peers - not just this unique hash?



/siddhartha
Posted on 2004-12-29 09:54:22 by siddhartha
I'll answer your two questions in reverse order.
2) There will be NO other way to identify unique users than the pseudorandom session hash. That is the whole point of the exercise.
1) There is an informal ratio subsystem implemeted within the protocol which will be discussed in due course, but in short laymans terms, if you refuse a request for a piece of a file from a peer, that peer wil ban you.
Posted on 2004-12-30 08:48:44 by Homer
Thanks for the plain answers, I'm pretty happy with them! I like your approach to the implementation of this protocol. So the first answer was very important for me - I think that this is very sane and mature way of doing these kind of jobs in these unsure times :) As for the second answer - I think that this ratio subsytem should be implemented very carefully and precisely, because it could be the starting point of lot disbalances in the whole system. Thanks for the input, mate, I appreciate your work!



/siddhartha
Posted on 2004-12-30 12:00:08 by siddhartha
HP2P protocol is (as I've mentioned) based largely on two existing protocols, namely MUTE and BT.
Two of the nicer features of BitTorrent (imho) are that files are transferred nonlinearly, and of course its anti-leech mechanism.
HP2P implements these in a very similar manner.
Files are divided into one or more "pieces". The current version of HP2P does not leave the decision of piece size to the creator of the shared resource - piece size is static and hardcoded to be 8192 bytes.
The relatively small piece size should mean less retransmissions due to packet corruption. For each file being shared, the client keeps a small bitmap which keepts track of which pieces of the file are owned.
When a download begins or is resumed, sharing users swap their piecemaps, and can thus make educated requests for pieces which they need and which are available, and yet still do so in random order.
Thus you could download one piece somewhere in the middle of some mpeg, and already be sharing your one piece.
This means that the creator of the shared resource really only needs to upload all the pieces once, even though the pieces may be distributed across N users, the creator can disappear at this point and sleep easy.

An example case:
You wish to resume a download, but your client has no knowledge of peers who share it. Your client will issue a broadcast request for users who share that specific file, by Hash. Caring, sharing peers will respond with their piecemaps. Your client now has sufficient data to begin making requests to specific peers for specific pieces. Let's suppose your client makes one or more such requests, and one of the requests times out.
The peer has actively refused your request for a piece of a file you know they have. The client is not adhering to the protocol !! We will dump them from our own list of peers, and disregard all requests from Them to Us, for the remainder of the session or until a second timeout is reached.

By not sharing, they are only hurting their own transfer rates, because we are potentially routing traffic to them, and we've banned them.
This mechanism encourages filesharing, which is what a filesharing network is all about, right?

I'm still pressed for time here but I'll try to maintain daily posting on this protocol until I have emptied my bowel.

Happy New Year to all :)
Posted on 2005-01-01 22:57:14 by Homer
Happy new year to all from me! Thanks for the good explainations! I'm looking forward for hearing rom you here on the board - this topic is very important for me and I hope I can help with my ideas!




/siddhartha
Posted on 2005-01-02 01:27:17 by siddhartha
At the moment, the client is storing piecemaps for incomplete downloads in one folder, then theres the main downloads folder, and then a third folder for files being shared.
The client does some stuff with them on startup and shutdown, it's a poor mans implementation and I make no apology :)
The Piecemap binary files are named the same as the original filename, but are much smaller - generally around 30 to 100 bytes.
As well as the PieceMap, they contain the original FileSize and Hash. Because of this I prefer to call them HashMaps.
At startup, the HashMap and Shares folders are scanned.. for each FileBeingShared, the HashMap folder is scanned for a HashMap whose name matches - if not found, we will hash the Shared file and create a local HashMap.. this saves us having to ever rehash the local file, which can take time for very large files.
Let me repeat, for each FileBeingShared there should be a corresponding HashMap file.
This system is the same used by the FilesBeingDownloaded... when you begin a download, a HashMap will be created which will be reloaded next time the client starts, so resume can be totally automatic and start quickly.
Posted on 2005-01-02 09:35:43 by Homer
Will you be adding hashes for individual peices? I think thats how BitTorrent dose it.
Posted on 2005-01-02 11:20:47 by QuantumMatrix1024
Yes :)
All HP2P packets include a crc which is in reality a hash of the packet content. Please see the packet header structure previously described :)
Posted on 2005-01-02 22:09:50 by Homer
Hi Homer

Yes :)
All HP2P packets include a crc which is in reality a hash of the packet content. Please see the packet header structure previously described :)


I mean do you plan to have the downloader store hashes for each peice so that he can verify he received an unmodified peice. Doing a crc of each packet protects against corruption but not intentional modification (because the crc of the packet would be modified as well)
Imagine somone deliberatly modified a peice and updated the crc of the packets containing it, the entire file would fail hash check and it would be impsosible to re-request a peice since you have no way to verify which peice of the file was modified.
I hope I've explained what I mean.
Posted on 2005-01-03 02:09:47 by QuantumMatrix1024
Yes :) The HashMap file format is as follows:

The HashMap File Format:
00-0F The first 16 bytes = the md5 hash of the complete file.
10-13 Following that, a dword containing the complete file's size,
14-17 then another dword containing the "number of filepieces" (#Pieces),
18-1B the number of bytes in the piecemap,
1C-?? Array of PieceHashes (#Pieces * SIZEOF MD5HASH)
??-?? and finally, ( #Piecess/8 )+1 bytes of "NeededPieces BitMap"


As you can see, the current version only supports 32bit filesize.
The underlying file io class supports a full 64bit however, and it will be quite simple to remedy this issue later.

The NeededPieces BitMap is a linear bit array, thus the bit which identifies Piece#N is at bit offset N in the binary array. 0 indicates a piece we Have, while 1 indicates a piece we Need. Therefore, when we begin a NEW download (and create a "blank hashmap file"), the bitmap contains all 1's, ie all FF bytes..
Posted on 2005-01-03 04:50:15 by Homer
I've already discussed the first packets of the protocol enough (those pertaining to the client/server handshake). What I'd like to do next is start discussing the protocol in terms of hypothetical examples.
Let's now assume that one or more peer sessions are established and in the "idle" connected state.

The first thing a new user is likely to do is make a string-based search.
StringSearches are a broadcast packet - they are "sent out all Doors except the Door through which they arrived".
The packet format is quite simple, it's a standard HP2P packet header, with the zeroterminated "search string" as the "packet payload".
The SearchString may consist of one or more SearchTerms, with space as the separator.
The StringSearch packetheader carries the VirtualID of its creator.
Peers receiving such a packet will perform a local search as well as propagating the search to other peers.
Since all unique HP2P packets contain a unique packethash, each client maintains a small database of "recently seen packets" in order to reduce "feedback loops" being established.

Only peers who have a positive response need reply to the StringSearch request...they do so using "the Door with the best HopScore for the creator user - which usually = the Door via which the request arrived", thus their reply is back-propagated efficiently to the originator.
The reply to a StringSearch request contains an array of one or more of the following element:
FileNameString (zeroterminated)
FileHash for Full File
FileSize as dword
The format for a HashReply packet is also quite simple.
The standard HP2P header is followed by the VID of the respondant user, and then comes an array of the above element,
and the array is zeroterminated.
The packetheader contains the VID of the originator, unaltered.

Once the Originator of the Search receives one or more of these Replies, their client will parse the results into their SearchResults listview.
Now the user has a list of filenames and filehashes, they may initiate their first ever Download. Resuming a Download requires no StringSearches.
Posted on 2005-01-04 23:36:39 by Homer