After reading through topic1192.html
it would seem to me that we have come a long way in distributed databases. There are a lot of calculations on how long it would take to load a 1tb file and search for a hash etc, but with some of the new database for example say we load a table into Voldemort (http://project-voldemort.com/
) it would make the lookup quite painless. Or even using mongodb which does sharding, partitioning, fault tolerance, redundancy, and a whole bunch of other stuff automagically could make those lookup times extremely small if distributed across enough machines.
Ideally implementing a distributed file system into this project would be the best way to go. All the users connected store a part of the data, of course with redundancy and replication you could minimize the loss of data. Then users could submit to the site whats the cracked version of this hash and a task could be sent out to everyone saying "who has this hash". Users who have it would simply return it. Of course the amount of data needed to be stored by a single user would be somewhat significant. Using the magic number 7 to propose that at any time 7 machines should have the same data then if there is 2731gb of data currently and only 1654 online machines each user would need to store at least 11.56gb of data: (2731 gb * 7 replications) / 1654 machines = 11.558 gb... Of course people who already have entire sets of data could enable a setting that offers to store a larger set of the table, thus reducing the minimum load on other users.