US 7,599,931 B2
Web forum crawler
Bin Shi, Beijing (China); Gu Xu, Beijing (China); and Wei-Ying Ma, Beijing (China)
Assigned to Microsoft Corporation, Redmond, Wash. (US)
Filed on Mar. 03, 2006, as Appl. No. 11/368,261.
Prior Publication US 2007/0208703 A1, Sep. 06, 2007
Int. Cl. G06F 7/00 (2006.01); G06F 17/30 (2006.01)
U.S. Cl. 707—6  [707/4] 14 Claims
OG exemplary drawing
 
1. A system with a processor and memory for crawling a site having pages, each page having a reference that identifies the page, each reference having tokens, comprising:
a grouping component that identifies groups of pages with similar content;
a pattern component that identifies a reference pattern of a group based on the references of the pages of the group, the reference pattern being identified by analyzing the tokens of the references of the pages of the group to identify sequences of tokens indicating a pattern of tokens within the references; and
a decision component that, after encountering a reference that matches a reference pattern when crawling the site, decides whether to access the page of the encountered reference based on characteristics of the pages of the group of the matching reference pattern
wherein the components are implemented as computer-executable instructions stored in the memory for execution by the processor.