Scientific data transfers have gotten so large that previously rare transmission errors in the Internet are causing some scientific data transfers to be corrupted. The Internet’s error checking mechanisms were designed at a time when a megabyte was a large file. Now files can contain terabytes. The old error checking mechanisms are in danger of being overwhelmed. This project seeks to find new error checking mechanisms for the Internet to safely move tomorrow’s scientific data efficiently and without errors.
This project addresses two fundamental issues. First, the Internet’s checksums and message digests are too small (32-bits) and probably are poorly tuned to today’s error patterns. It is a little-known fact that checksums can (and typically should) be designed to reliably catch specific errors. A good checksum is designed to protect against errors that it will actually encounter. So the first step in this project is to collect information about the kinds of transmission errors currently happening in the Internet - the first such study in 20 years. Second, today’s file transfer protocols, if they find a file has been corrupted in transit, simply discard the file and transfer it again. In a world in which the file is huge (tens of terabytes or even petabytes long), that’s a tremendous waste. Rather, the file transfer protocol should seek to repair the corrupted parts of the file. As the project collects data about errors, it will also design a new file transfer protocol that can incrementally verify and repair files.
This project will improve the Internet’s ability to support big data transfers, both for science and commerce, for decades to come. Users will be able to transfer big files with confidence that the data will be accurately and efficiently copied over the network. This work will further NSF’s “Blueprint for a National Cyberinfrastructure Ecosystem” by ensuring a world in which networks work efficiently to deliver trustworthy copies of big data to anyone who needs it.