With the rapid development of cloud computing, information shows explosive growth. The cheap cloud storage and computing power, also contributed to the generation and applications of big data. The big data is unstructured data more than 50%, so much of them are stored as files in the file system. The big data is divided into many parts that stored into chunk server, and generates the corresponding metadata that stored into the master server. Then how to collect the web-url and the terms, and how to retrieval is be researched.