Tuesday, September 13, 2005

BitTorrent Explained


BitTorrent is sort of a download manager.

(Note: The following is intended for newcomers to the mere idea of BitTorrent. See this page for the actual protocol documentation.)

The problem

Imagine this. Everybody wants to download the latest 100MB patch for
Super Game 76. So Super Game 76's makers, Genericasoft, upload
patch on their servers so that all of Super Game 76's 10,000 players
can download it.
But Super Game 76 is insanely popular. This means that in a
relatively short short space of time, Genericasoft's servers are going
to have to provide 10,000 copies of that 100MB file. That's gonna strain
any server, right? Genericasoft runs out of bandwidth, everybody
finds their download to be running horribly slowly (if at all), nobody
gets the patch they need, and it's all bad.
Or maybe your site hosts some cool video - say 10MB - for people to
download, but then suddenly Slashdot links to you or something,
and bam! you have about a hundred thousand people trying to download
the file at once, something your server isn't prepared for, causing
it to nosedive.

The solution

If a hundred thousand people are downloading at once, that's a whole
lot of combined downloading bandwidth. But it's also a whole lot
of combined upload bandwidth... none of which is being used for anything. Can't we use that to our advantage? Yes, we can!

Suppose Genericasoft divides their 100MB patch into, say, four
hundred smaller chunks. Then, to get the whole patch, you just need
to get four hundred chunks instead of one big one. And let's suppose
that when people connect to Genericasoft to get their patch,
they don't download the chunks sequentially... they just get whatever
chunks are available until they have the complete set. And let's
suppose - this is the clever bit - that everybody knows which chunks everybody else has.
Now you connect to a special Genericasoft server called a "tracker".
A tracker is a server dedicated to keeping track of which chunks
everybody has, and keeping everybody up to date. So Genericasoft's
tracker says to you "Right, I've got all four hundred chunks. Person A
over there has chunks 2 and 3 only. Person B has all four hundred
chunks. Person C just has chunks 1 to 100. Person D..." until it's told
you who has what chunks available. Then, and this is the really clever bit, instead of getting all four hundred chunks from Genericasoft, you get each chunk from whoever happens to have it available. You could get chunk 1 from person C, chunks 2 and 3 from person A, chunk
4 from Genericasoft itself, or whatever. The point is, you don't get all the chunks from Genericasoft. The majority of them, you get from other people who are also downloading at the same time as you are.
The clear advantage here is that Genericasoft saves an awful lot of
bandwidth. Since you can download many chunks at once, it also
means you can download as fast as your personal internet connection can
manage - you aren't limited by whatever connection speed Genericasoft
is stuck with.

How to use BitTorrent

The above concepts were - as far as I am aware - invented by Bram
Cohen. He named the protocol "BitTorrent", and also came up with the first BitTorrent computer program ("client").
Once you have your client installed, you go to Genericasoft's
website. Genericasoft will have set up a small file, of the order
of a few dozen kilobytes, called a "torrent", freely available for
anybody to download. It'll be called something like
"sg76patch.torrent". Instead of downloading the 100MB patch, you download this relatively
tiny file instead. The torrent contains all the information about the
patch that your client needs to download it: the name and location of
the tracker that it needs to connect to, the name of the file it's
downloading, the size of the chunks it's been split up into, what order
they go in... stuff like that.

Then you double-click on it to open it. .txt files open in Notepad.
.doc files open in Word. .torrent files open in BitTorrent! Your
BitTorrent client will open up, ask you to select a location to save
the patch, and whizz away, finding chunks and downloading them until it's got all four hundred (or whatever. The number of chunks can vary hugely). As with KaZaA, Direct Connect and so on, all this downloading can be done in the background while you do other things, and could take any amount of time, depending on the size of the patch, how much bandwidth you have, and how many other people are also downloading it at the same time as you. When it's done, it'll
stitch the chunks back together and say "I'm done!" Then you can close
BitTorrent and get on with installing your patch.
Note that BitTorrent is not in any sense searchable, like KaZaA or
eMule or a Direct Connect hub. You can't just run it and type "Super
Game 76" to find the patch you're after. Instead, you have to go out
there on the big wide internet and find the torrent you're after
manually.

BitTorrent etiquette

As you may have figured out, if you are downloading chunks from
other people using BitTorrent, then they must be uploading chunks to
you. Similarly, other people will be downloading chunks from you. If you are downloading, you must upload! Otherwise, the entire exercise is pointless, and Genericasoft might as well serve every patch individually all on their own. All BitTorrent clients will force you
to upload as well as download. Most of them will also keep a record of
how much you've uploaded compared to how much you've downloaded (your
share ratio). Ideally, to keep the universe in karmic balance as it
were (and to preserve the BitTorrent network), you should upload as
much as you download; i.e. your share ratio should ultimately be 1.00
or higher.
People who have all of the chunks no longer need to download
anything, so they can quit out. However, you can leave BitTorrent open
and stick around anyway, letting people download anything they need from you. If
you do this you are called a "seed". Without at least one seed, it
should be obvious that any BitTorrent network will collapse, because sooner or
later there will be a chunk which nobody
has, meaning nobody can finish their download. In this example, Genericasoft's server would
probably be a seed, but if you're feeling nice, you can seed too. The
chances are that when your download has finished, your share ratio will
be less than 1.00 anyway, so in this case you should seed until you
reach 1.00 anyway.
Not that there's any reason you should stop there. In fact, you're
entirely free to keep seeding for as long as you like. You can even
open up the torrent again at a later date, after you're all done, and just
do a little more seeding on general principle.

Different BitTorrent clients

Cohen's original client is very
simplistic. However, it's short on features. Fortunately, BitTorrent is
an open-source program, meaning that any programmer can take a look at
Cohen's code, and add bells and whistles of his own. This has resulted
in a bundle of other BitTorrent clients being made independently. My
preferred client is Azureus, which is much more powerful and versatile than the original BitTorrent client. It allows you to run several torrents at once, throttle them to
different levels, set priorities on them, and even create your own torrents.
There are many more out there though.

How the illegal downloading scene works

We've established that BitTorrent is a good way to get content to
lots of people in a short space of time without expending huge amounts
of bandwidth. What we also find is that BitTorrent is a great way to
distribute copyrighted material.
It works like this. Instead of it being something legal (a game
patch or the latest Linux distribution), the file being distributed can
easily be illegal, or at least of dubious legality (an album (you can distribute
many files at once using the same torrent), the CD or DVD images of a
new game, a movie, a piece of software or the latest episode of a TV show).
And instead of somebody reputable like Genericasoft or a Linux
developer group, the person who makes and distributes the relevant
torrent and maintains the tracker can be anybody. There are in fact a
bundle of groups of individuals which churn out tonnes of these torrents.
Distributing the torrents for these movies/games/videos becomes a little more complicated as, while technically a site providing torrents isn't directly providing any copyrighted material, it is still legally very dodgy ground. There are a whole bunch of sites dedicated to
circulating these torrents, but finding them isn't trivial.

Pros and cons of BitTorrent compared to other P2P applications

BitTorrent is best consumed as part of a balanced diet of P2P apps
because, like all P2P apps, it has both strengths and weaknesses.

Pros:

  • No leeching is permitted. Yes, this IS a pro if you think about it.

  • Alleviates server strain as described above.

  • Unlike
    KaZaA, there's no underlying network of nodes or servers which can be
    shut down. As long as torrents are made and distributed, and trackers
    remain online, BitTorrent will continue to run.

  • BitTorrent is totally free. It does not and never will contain adverts.

  • BitTorrent is open-source. Anybody can make a BitTorrent client with whatever features he wants.
  • No spyware, adware, malware, popups, or other undesirables are bundled with it.

Cons:

  • No leeching is permitted. You have to upload as well as
    download. All BitTorrent clients which allow throttling will adjust
    your download throttle to match your upload throttle: if you're not
    letting people download from you then you won't be allowed to download
    from them.

  • Not searchable. Big sites full of torrents are easy to find, but no single
    site carries all the torrents; if you're looking for something obscure
    then you may have to look a long way.

  • BitTorrent is good for downloading stuff which is popular now,
    because the number of seeds and peers will be large. Old, obscure and
    non-mainstream material, on the other hand, is difficult to find.
    Looking for an album that's at number one in the charts right now? No
    problem. Searching for an obscure EP from 1998? Even if you manage to
    find a torrent, it's entirely possible that the tracker might have been
    taken offline, or that there are no seeds, leaving you up the creek.

  • BitTorrent is totally public and hence insecure. While nobody can use it to send
    you viruses or anything like that, your IP address is visible to
    everybody else using the torrent. If you're using BitTorrent, you can
    be tracked down easily.

0 Comments:

Post a Comment

<< Home