如果是非讀懂不可的,那就只有打攻堅戰,補基礎,問(wèn)別人,盡量搞懂;否則可以先只讀大意,細節放一邊,以后有需要再回來(lái)細讀(那時(shí)可能基礎也有了)。至于大意,我覺(jué)得起碼要有幾個(gè)要點(diǎn)需要清晰(也請其他網(wǎng)友補充):1。解決的目標問(wèn)題(最好也稍微了解一下問(wèn)題提出的背景);2。已知條件;3。假設(有時(shí)候是隱含假設,作者沒(méi)有明說(shuō),所以要訓練金睛火眼,呵呵);4。解決的大致思路(很多paper都有對思路比較直觀(guān)的解釋?zhuān)?。主要結論(是完全解決還是部分解決,有沒(méi)有比較重要的中間推論,等等);6。跟其它方法的比較;7。不足之處(也就是以后可以繼續做工作的地方)。其中1是最重要的,其次是2、3、4、5,最后是6、7。不過(guò)這都純屬個(gè)人經(jīng)驗,看情況參考吧:) 88 Information extraction - Wikipedia, the free encyclopedia
Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured or semistructured information from unstructured machine-readable documents.A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted. Current approaches to IE use natural language processing techniques that focus on very restricted domains. For example, the Message Understanding Conference (MUC) is a competition-based conference that focused on the following domains in the past: 87 MySQL安全性指南
作為一個(gè)MySQL的系統管理員,你有責任維護你的MySQL數據庫系統的數據安全性和完整性。本文主要主要介紹如何建立一個(gè)安全的MySQL系統,從系統內部和外部網(wǎng)絡(luò )兩個(gè)角度,為你提供一個(gè)指南。 86 MySQL高級特性----對比與其他數據庫 - MYSQL - 技術(shù)天地 - 賽迪網(wǎng)
對于速度的真實(shí)比較,以及不斷成熟的MySQL基準套件。見(jiàn)10.8 使用你自己的基準。因為沒(méi)有線(xiàn)程創(chuàng )建開(kāi)銷(xiāo)、一個(gè)較小的語(yǔ)法分析器、較少功能和簡(jiǎn)單的安全性,mSQL應該在下列方面更快些: 85 什么是海量數據挖掘引擎--DoNews.com--IT社區
傳統的關(guān)鍵詞搜索引擎技術(shù)產(chǎn)生于上世紀末,通過(guò)對網(wǎng)頁(yè)文本的全文搜索提供了網(wǎng)頁(yè)快速查詢(xún)的手段,使得網(wǎng)頁(yè)信息的可用性大大提高。但隨著(zhù)網(wǎng)頁(yè)數量的快速膨脹,重復引用,使得羅列的搜索結果越來(lái)越難以利用。多媒體技術(shù)、寬帶技術(shù)的發(fā)展也使網(wǎng)絡(luò )資源日趨多元化,這些資源質(zhì)量評價(jià)標準不同、特征各異,混合排序難以達到滿(mǎn)意的效果。網(wǎng)絡(luò )用戶(hù)年齡結構年輕化,平均知識水平降低,使得用戶(hù)對搜索技巧掌握、結果篩選的能力降低。網(wǎng)絡(luò )上不同領(lǐng)域愛(ài)好者群體的興起對搜索結果的個(gè)性化、專(zhuān)業(yè)化提出了更高要求。 84 Block-Level Link Analysis - What Does It Mean To You?
Microsoft s research lab has released a paper in which they discuss a new way to rank web sites. The new method is called :block-level link analysis. 83 VIPS: a VIsion based Page Segmentation Algorithm
The VIsion-based Page Segmentation (VIPS) algorithm aims to extract the semantic structure of a web page based on its visual presentation. Such semantic structure is a tree structure; each node in the tree corresponds to a block. Each node will be assigned a value (Degree of Coherence) to indicate how coherent of the content in the block based on visual perception, the bigger is the DoC value, the more coherent is the block. The VIPS algo-rithm makes full use of page layout structure. It first extracts all the suitable blocks from the html DOM tree, and then it finds the separators between these blocks. Here, separators denote the hori-zontal or vertical lines in a web page that visually cross with no blocks. Based on these separators, the semantic tree of the web page is constructed. Thus, a web page can be represented as a set of blocks (leaf nodes of the semantic tree). Compared with DOM based methods, the segments obtained by VIPS are much more semantically aggregated. Noisy information, such as navigation, advertisement, and decoration can be easily removed because they are often placed in certain positions of a page. Contents with different topics are distinguished as separate blocks. 82 google電話(huà)面試過(guò)程
因為我申請的是Wireless Developer的職位,他問(wèn)我是否做過(guò)J2ME以及手機應用開(kāi)發(fā)方面的工作。由于沒(méi)有做過(guò),只好老實(shí)的說(shuō)沒(méi)有,但做過(guò)協(xié)議棧方面的開(kāi)發(fā)。他顯然對這個(gè)不感興趣,沒(méi)有多問(wèn)。接下來(lái)的所有時(shí)間,我都在回答他給我做的一個(gè)算法問(wèn)題,耗費了40多分鐘,最后基本上是他把算法說(shuō)出來(lái),狂汗。其實(shí),我現在想想,這應該是一個(gè)簡(jiǎn)單的問(wèn)題,也不知道當時(shí)為什么就想不出來(lái),再汗。建議申請開(kāi)發(fā)職位的兄弟一定要打好算法方面的基本功。偶這方面就從來(lái)沒(méi)有系統學(xué)習過(guò),很弱。我把他的題目帖出來(lái)吧,感興趣的可以看看已有數組表示了一個(gè)文檔中的單詞出現的位置,輸入k個(gè)單詞,請找出包含改k個(gè)單詞的最短的位置。比如有其中的三個(gè)數組為:hello -> 5 14 19 35 52world -> 11 17 29 40goodbye -> 1 25 63 72后面的數字是該單詞在文檔中出現的位置,若輸入是hello world goodbye的話(huà),最短的位置是什么? 81 :::實(shí)施數據挖掘項目考慮的問(wèn)題:::
談到數據挖掘應從以下三方面加以考慮,一是用數據挖掘解決什么樣的商業(yè)問(wèn)題,二是為進(jìn)行數據挖掘所做的數據準備,三是數據挖掘的各種分析算法。 80 :::數據挖掘應用:::
需要強調的是,數據挖掘技術(shù)從一開(kāi)始就是面向應用的。目前,在很多領(lǐng)域,數據挖掘(data mining)都是一個(gè)很時(shí)髦的詞,尤其是在如銀行、電信、保險、交通、零售(如超級市場(chǎng))等商業(yè)領(lǐng)域。數據挖掘所能解決的典型商業(yè)問(wèn)題包括:數據庫營(yíng)銷(xiāo)(Database Marketing)、客戶(hù)群體劃分(Customer Segmentation |