我們來(lái)簡(jiǎn)單了解如何操作hbase,hbase與我們常用的數據庫最大的差別就是列存儲和無(wú)數據類(lèi)型,所有數據都以string類(lèi)型存儲,再有如果hbase table里有5個(gè)字段,但實(shí)際只有4個(gè)字段有值,那么為null的那個(gè)字段是不占用空間的,這點(diǎn)比較好,可以跟我們常用的數據庫比較下
首先還是創(chuàng )建一個(gè)表,暫不使用mapreduce:
/*** 定義幾個(gè)常量*/public static HBaseConfiguration conf = new HBaseConfiguration();static HTable table = null;/*** 創(chuàng )建hbase table* @param table* @throws IOException*/public static void creatTable(String tablename) throws IOException {HBaseAdmin admin = new HBaseAdmin(conf);if (!admin.tableExists(new Text(tablename))) {HTableDescriptor tableDesc = new HTableDescriptor(tablename);tableDesc.addFamily(new HColumnDescriptor("ip:"));tableDesc.addFamily(new HColumnDescriptor("time:"));tableDesc.addFamily(new HColumnDescriptor("type:"));tableDesc.addFamily(new HColumnDescriptor("cookie:"));//注意這個(gè)C列,下面我會(huì )簡(jiǎn)單以此列來(lái)說(shuō)明列存儲tableDesc.addFamily(new HColumnDescriptor("c:"));admin.createTable(tableDesc);System.out.println("table create ok!!!");} else {System.out.println("table Already exists");}}將hadoop/hbase都啟動(dòng)再運行,在hql中使用"desc tablename"就可以看出這個(gè)表有5個(gè)字段,下面我們再來(lái)往這個(gè)表里錄入點(diǎn)數據,上面說(shuō)過(guò)值為空的字段是不占用空間的,這里還要注意點(diǎn),經(jīng)過(guò)我的測試,如果發(fā)現這個(gè)字段無(wú)值,就不要往hbase里面寫(xiě)null值,當然你要真往這個(gè)字段存null值是不會(huì )有任何問(wèn)題的,但你查詢(xún)這個(gè)有null值的字段時(shí),會(huì )有點(diǎn)毛病,當然這個(gè)我也不知道怎么描述,有興趣的可以試試看,所以我下面有判斷,再者,hbase table里面的每一行數據集都必須有一個(gè)唯一row關(guān)鍵字,這個(gè)row你可以隨便定義,方便準確找到你需要的數據
/*** 錄入數據* @throws Exception*/public static void insertData() throws Exception{//讀取日志文件BufferedReader reader = new BufferedReader(new FileReader("log file name"));if(table==null)table = new HTable(conf, new Text(tablename));String line;while((line = reader.readLine()) != null){//這里我就不說(shuō)了,先前有說(shuō)明LogAccess log = new LogAccess(line);//這里我使用time+cookie為row關(guān)鍵字,確保不重復,如果cookie記錄有重復,將區別對待,這里暫不多做說(shuō)明String row = createRow(log.getTime(),log.getCookie());long lockid = table.startUpdate(new Text(row));if(!log.getIp().equals("") && log.getIp()!=null)table.put(lockid, new Text("ip:"), log.getIp().getBytes());if(!log.getTime().equals("") && log.getTime()!=null)table.put(lockid, new Text("time:"), log.getTime().getBytes());if(!log.getType().equals("") && log.getType()!=null)table.put(lockid, new Text("type:"), log.getType().getBytes());if(!log.getCookie().equals("") && log.getCookie()!=null)table.put(lockid, new Text("cookie:"), log.getCookie().getBytes());//這里要注意,我是往c列中寫(xiě)入了5個(gè)字段,你可以想象,我在c列中存入了一個(gè)mapif(!log.getRegmark().equals("") && log.getRegmark()!=null)table.put(lockid, new Text("c:_regmark"), log.getRegmark().getBytes());if(!log.getRegmark2().equals("") && log.getRegmark2()!=null)table.put(lockid, new Text("c:_regmark2"), log.getRegmark2().getBytes());if(!log.getSendshow().equals("") && log.getSendshow()!=null)table.put(lockid, new Text("c:_sendshow"), log.getSendshow().getBytes());if(!log.getCurrenturl().equals("") && log.getCurrenturl()!=null)table.put(lockid, new Text("c:_currenturl"), log.getCurrenturl().getBytes());if(!log.getAgent().equals("") && log.getAgent()!=null)table.put(lockid, new Text("c:_agent"), log.getAgent().getBytes());//存入數據table.commit(lockid);}}
O了,測試下吧
聯(lián)系客服
評論
上面這篇文章詳細講解了hadoop/hbase的安裝以及啟動(dòng)方式,有興趣看看吧