Abstract
With the rise in the amount information of being streamed across networks, there is a growing demand to vet the quality, type and content itself for various purposes such as spam, security and search. In this paper, we develop an energy-efficient high performance information filtering system that is capable of classifying a stream of incoming document at high speed. The prototype parses a stream of documents using a multicore CPU and then performs classification using Field-Programmable Gate Arrays (FPGAs). On a large TREC data collection, we implemented a Naive Bayes classifier on our prototype and compared it to an optimized CPU based-baseline. Our empirical findings show that we can classify documents at 10Gb/s which is up to 94 times faster than the CPU baseline (and up to 5 times faster than previous FPGA based implementations). In future work, we aim to increase the throughput by another order of magnitude by implementing both the parser and filter on the FPGA.
Original language | English |
---|---|
Title of host publication | CIKM '13 Proceedings of the 22nd ACM International Conference on Information & Knowledge Management |
Place of Publication | New York, NY, USA |
Pages | 1245-1248 |
Number of pages | 4 |
DOIs | |
Publication status | Published - 27 Oct 2013 |
Keywords
- classification
- parsing
- fpga
- filtering