在日常的使用过程中,可能经常需要将一个集群中hbase的数据迁移到或者拷贝到另外一个集群中,这时候,可能会出很多问题
以下是我在处理的过程中的一些做法和处理方式。
前提,两个hbase的版本一直,否则可能出现不可预知的问题,造成数据迁移失败
当两个集群不能通讯的时候,可以先将数据所在集群中hbase的数据文件拷贝到本地
具体做法如下:
在hadoop目录下执行如下命令,拷贝到本地文件。 bin/hadoop fs -copyToLocal /hbase/tab_keywordflow /home/test/xiaochenbak
然后你懂得,将文件拷贝到你需要的你需要迁移到的那个集群中,目录是你的表的目录,
如果这个集群中也有对应的表文件,那么删除掉,然后拷贝。
/bin/hadoop fs -rmr /hbase/tab_keywordflow /bin/hadoop fs -copyFromLocal /home/other/xiaochenbak /hbase/tab_keywordflow 此时的/home/other/xiaochenbak为你要迁移到数据的集群。
重置该表在.META.表中的分区信息
bin/hbase org.jruby.Main /home/other/hbase/bin/add_table.rb /hbase/tab_keywordflow
/home/other/hbase/bin/add_table.rb为ruby脚本,可以执行,脚本内容如下:另存为add_table.rb即可
# # Copyright 2009 The Apache Software Foundation # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Script adds a table back to a running hbase. # Currently only works on if table data is in place. # # To see usage for this script, run: # # ${HBASE_HOME}/bin/hbase org.jruby.Main addtable.rb # include Java import org.apache.hadoop.hbase.util.Bytes import org.apache.hadoop.hbase.HConstants import org.apache.hadoop.hbase.regionserver.HRegion import org.apache.hadoop.hbase.HRegionInfo import org.apache.hadoop.hbase.client.HTable import org.apache.hadoop.hbase.client.Delete import org.apache.hadoop.hbase.client.Put import org.apache.hadoop.hbase.client.Scan import org.apache.hadoop.hbase.HTableDescriptor import org.apache.hadoop.hbase.HBaseConfiguration import org.apache.hadoop.hbase.util.FSUtils import org.apache.hadoop.hbase.util.Writables import org.apache.hadoop.fs.Path import org.apache.hadoop.fs.FileSystem import org.apache.commons.logging.LogFactory # Name of this script NAME = "add_table" # Print usage for this script def usage puts 'Usage: %s.rb TABLE_DIR [alternate_tablename]' % NAME exit! end # Get configuration to use. c = HBaseConfiguration.new() # Set hadoop filesystem configuration using the hbase.rootdir. # Otherwise, we'll always use localhost though the hbase.rootdir # might be pointing at hdfs location. c.set("fs.default.name", c.get(HConstants::HBASE_DIR)) fs = FileSystem.get(c) # Get a logger and a metautils instance. LOG = LogFactory.getLog(NAME) # Check arguments if ARGV.size < 1 || ARGV.size > 2 usage end # Get cmdline args. srcdir = fs.makeQualified(Path.new(java.lang.String.new(ARGV[0]))) if not fs.exists(srcdir) raise IOError.new("src dir " + srcdir.toString() + " doesn't exist!") end # Get table name tableName = nil if ARGV.size > 1 tableName = ARGV[1] raise IOError.new("Not supported yet") elsif # If none provided use dirname tableName = srcdir.getName() end HTableDescriptor.isLegalTableName(tableName.to_java_bytes) # Figure locations under hbase.rootdir # Move directories into place; be careful not to overwrite. rootdir = FSUtils.getRootDir(c) tableDir = fs.makeQualified(Path.new(rootdir, tableName)) # If a directory currently in place, move it aside. if srcdir.equals(tableDir) LOG.info("Source directory is in place under hbase.rootdir: " + srcdir.toString()); elsif fs.exists(tableDir) movedTableName = tableName + "." + java.lang.System.currentTimeMillis().to_s movedTableDir = Path.new(rootdir, java.lang.String.new(movedTableName)) LOG.warn("Moving " + tableDir.toString() + " aside as " + movedTableDir.toString()); raise IOError.new("Failed move of " + tableDir.toString()) unless fs.rename(tableDir, movedTableDir) LOG.info("Moving " + srcdir.toString() + " to " + tableDir.toString()); raise IOError.new("Failed move of " + srcdir.toString()) unless fs.rename(srcdir, tableDir) end # Clean mentions of table from .META. # Scan the .META. and remove all lines that begin with tablename LOG.info("Deleting mention of " + tableName + " from .META.") metaTable = HTable.new(c, HConstants::META_TABLE_NAME) tableNameMetaPrefix = tableName + HConstants::META_ROW_DELIMITER.chr scan = Scan.new((tableNameMetaPrefix + HConstants::META_ROW_DELIMITER.chr).to_java_bytes) scanner = metaTable.getScanner(scan) # Use java.lang.String doing compares. Ruby String is a bit odd. tableNameStr = java.lang.String.new(tableName) while (result = scanner.next()) rowid = Bytes.toString(result.getRow()) rowidStr = java.lang.String.new(rowid) if not rowidStr.startsWith(tableNameMetaPrefix) # Gone too far, break break end LOG.info("Deleting row from catalog: " + rowid); d = Delete.new(result.getRow()) metaTable.delete(d) end scanner.close() # Now, walk the table and per region, add an entry LOG.info("Walking " + srcdir.toString() + " adding regions to catalog table") statuses = fs.listStatus(srcdir) for status in statuses next unless status.isDir() next if status.getPath().getName() == "compaction.dir" regioninfofile = Path.new(status.getPath(), HRegion::REGIONINFO_FILE) unless fs.exists(regioninfofile) LOG.warn("Missing .regioninfo: " + regioninfofile.toString()) next end is = fs.open(regioninfofile) hri = HRegionInfo.new() hri.readFields(is) is.close() # TODO: Need to redo table descriptor with passed table name and then recalculate the region encoded names. p = Put.new(hri.getRegionName()) p.add(HConstants::CATALOG_FAMILY, HConstants::REGIONINFO_QUALIFIER, Writables.getBytes(hri)) metaTable.put(p) LOG.info("Added to catalog: " + hri.toString()) end
好了,以上就是我的做法,如何集群键可以通信,那就更好办了,相信你懂得,scp
相关推荐
基于haodoop 集群搭建hbase集群。(2台机器的情况。)
hadoop集群配置流程以及用到的配置文件,hadoop2.8.4、hbase2.1.0、zookeeper3.4.12
Hadoop3.1.1集成hbase2.1.1,很详细的介绍了Hadoop3.1.1集成hbase2.1.1的教程,希望对大家有用。
Hadoop集群·CentOS安装配置(第1期) Hadoop集群·机器信息分布表(第2期) Hadoop集群·VSFTP安装配置(第3期) Hadoop集群·SecureCRT使用(第4期) Hadoop集群·Hadoop安装...Hadoop集群·HBase之旅(第11期副刊)
将数据从Hadoop中向HBase载入数据,该过程大致可以分为两步: 一、将Hadoop中普通文本格式的数据转化为可被HBase识别的HFile文件,HFile相当于Oracle中的DBF数据 文件。 二、将HFile载入到HBase中,该过程实际就是...
HBase – Hadoop Database,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。 HBase是Google Bigtable的开源实现,类似Google Bigtable利用...
本文档详细介绍了如何用ZooKeeper和Hadoop、HBase搭建分布式大数据分析平台。
Hadoop数据迁移--从Hadoop向HBase
Hadoop 集群配置详解 Hadoop_Hadoop集群(第1期)_CentOS安装配置 Hadoop_Hadoop集群(第2期)_机器信息分布表 Hadoop_Hadoop集群(第4期)_SecureCRT使用 Hadoop_Hadoop集群(第5期)_Hadoop安装配置 Hadoop_Hadoop...
hadoop+hbase集群搭建 详细手册
Hadoop2.6.2、Hbase1.1.2、Hive1.2.1 HA
Hadoop Hive与Hbase整合配置
hadoop,hbase单机,伪分布,全分布部署方式过程和步骤。
基于hadoop+hbase+springboot实现的分布式网盘系统,适合本科毕业设计 资源包含的整个demo在Hadoop,和Hbase环境搭建好了,可以启动起来。 技术选型 1.Hadoop 2.Hbase 3.SpringBoot ...... 系统实现的功能 1.用户...
hadoop,hbase,hive版本整合兼容性最全,最详细说明【适用于任何版本】,避免下载后才发现不兼容的坑
《Hadoop集群程序设计与开发(数据科学与大数据技术专业系列规划教材)》系统地介绍了基于Hadoop的大数据处理和系统开发相关技术,包括初识Hadoop、Hadoop基础知识、Hadoop开发环境配置与搭建、Hadoop分布式文件系统、...
自动化安装hadoop集群 脚本搭建hadoop集群 可以自定义主机名和IP地址 可以自定义安装jdk和hadoop(格式为*tar.gz) 注意事项 1、安装完jdk和hadoop请手动source /etc/profile 刷新环境变量 2测试脚本环境为centOS6,...
Hadoop2.7.3+HBase1.2.5+ZooKeeper3.4.6 搭建分布式集群环境详解。 详细介绍了如何搭建分布式集群环境。
Hadoop hbase hive sqoop集群环境安装配置及使用文档
Hadoop HA高可用集群搭建(Hadoop+Zookeeper+HBase)