本文描述问题及解决方法基于 腾讯云数据仓库 TCHouse-P( Tencent Cloud House-P,TCHouse-P )。
在大数据领域,Greenplum 是一个广泛使用的开源数据仓库工具,特别受到数据分析师、数据科学家和企业的青睐,因为它能够提供高效、高性能的数据分析功能。然而,掌握 Greenplum 的操作技巧并非易事,为了让用户更快速地掌握这些技巧,本文将为您提供 Greenplum 操作的锦囊妙计。希望通过本文,您能够快速熟悉 Greenplum,提升工作效率和数据分析能力。
select name, context from pg_settings;
select * from pg_stat_activity where current_query != '<IDLE>';
select * from pg_stat_activity where current_query != '<IDLE>' and now() - query_start > '1 hour';(运行是否超过1小时)
base下路径:base/dboid/relfilenode,其中dbid可以通过下面语句查看是哪一个数据库
select oid, datname from pg_database where oid = xxxx(dboid)
select oid, datname from pg_database where datname = '{dbname}'
其中relfilenode可以通过下面语句查看是哪一张表(这个需要切到对应数据库下面运行)
select a.relname, b.nspname from pg_class a join pg_namespace b on a.relnamespace = b.oid where a.relfilenode = xxxx(relfilenode);
select a.relfilenode, a.relname, b.nspname from pg_class a join pg_namespace b on a.relnamespace = b.oid where a.relname = '{tablename}'
SELECT tablename FROM pg_tables
select * from information_schema.schemata;
SHOW search_path
SET search_path TO ***
是否有数据倾斜,还可以带条件查看
select gp_segment_id,count(1) from {tablename} group by 1;
SELECT gp_segment_id, count(*) FROM table_name WHERE column='value' GROUP BY gp_segment_id;
show gp_server_version;
select version();
SELECT rolname from pg_roles ;
SELECT usename from pg_user;
SELECT * FROM gp_segment_configuration;
其中content相同的是一对,比如主备,segment从0-N-1,N就是segment的数量
role代表角色,P是主,M是备
preferred_role是之前的角色, 如果和role不一致,代表有主备切换
status代表状态,d是down了,s是正常
select datname,pg_size_pretty(pg_database_size(datname)) from pg_database;
select pg_size_pretty(pg_database_size('{DATBASE_NAME}'));
select schemaname ,round(sum(pg_total_relation_size(schemaname||'.'||tablename))/1024/1024) "Size_MB" from pg_tables where schemaname='{SCHEMANAME}' group by 1;
普通表:
select pg_size_pretty(pg_relation_size('relation_name')); -- pg_relation_size返回byte大小,pg_size_pretty转化为可读的单位
select pg_size_pretty(pg_total_relation_size('relation_name')); -- 查看表的总大小,包括索引这些等
分区表:
select schemaname,tablename,round(sum(pg_total_relation_size(schemaname||'.'||partitiontablename))/1024/1024) "MB" from pg_partitions where tablename='catalog_returns' group by 1,2;
SELECT * FROM gp_toolkit.gp_disk_free ORDER BY dfsegment;
SELECT
table_schema || '.' || table_name
AS table_full_name, pg_size_pretty(pg_total_relation_size('"' ||table_schema || '"."' || table_name || '"')) AS size
FROM
information_schema.tables
ORDER BY
pg_total_relation_size('"' || table_schema || '"."' || table_name || '"')
DESC limit 20
psql -p 20001 -d etl3d -f /home/gpadmincloud/install/TPC-DS/05_sql/179.tpcds.79.sql
psql -c "sql语句" -h 主机 -p 端口 -U 用户名 -d 数据库名
select statime,stausename from pg_stat_last_operation where classid = 'pg_class'::regclass and objid = 'schemaname.tablename'::regclass and staactionname = 'CREATE';
select * from pg_available_extensions;
select * from pg_available_extension_versions;
select user
SELECT * from gp_toolkit.gp_stats_missing;
注:空表会一直在里面
select * from INFORMATION_SCHEMA.role_table_grants where grantee='user_name' and table_name='table';
env PGOPTIONS="-c gp_session_role=utility" psql -p 40000 -d postgres
select * from pg_filespace_entry
PGPASSWORD=lambpwd11 psql -h10.0.6.43 -p5436 -Ulambuser -d testdb
select * from pg_extension
这里面的gp_segment_id对应的是gp_segment_configuration的content值
select gp_segment_id,* from alldata.t_userinfo where uid = 462324573;
数据库对象(表,索引,视图等)以及全局对象(role)等的操作(CREATE,ANALYZE等)
select objname,actionname,statime from pg_stat_operations where objname like 'sales';
\dS : 系统表也会展示出来
\dT name : 查看自定义数据类型,比如rolname和datname的类型是name
\df {func_name} : 查看函数
\dn : 查看schame
\di : 查看index
SELECT * FROM pg_trigger
最常见是表
select * from pg_locks where relation = '39964457';
其中relation是下面的oid
select oid,* from pg_class where relname = '{table_name}'
特别是alter table相关的sql,先在master查看是否有锁
select * from pg_stat_activity;
然后看是否有相互等待的session
SELECT * FROM pg_catalog.gp_dist_wait_status();
然后在每个segment上看是否有锁
-- 这里有可能出现segment上有没有kill干净的sql在继续跑,导致在segment上语句加上锁了
select * from pg_stat_activity
select * from pg_locks where pid = {xxx}(这里的pid就是pg_stat_activity的procpid)
select * from pg_class where oid = {xxx} (这里的oid就是pg_locks的relation字段,如果locktype是relation)
select relstorage from pg_class where relname = '{table_name}';
a -- 行存储AO表
h -- heap堆表、索引
x -- 外部表(external table)
v -- 视图
c -- 列存储AO表
select prosrc from pg_proc where proname = '{function_name}'
select prosrc from pg_proc where proname = '{function_name}'
select * from pg_partitions where partitiontablename = 'test_range_partition_1_prt_p1';
tablespace与filespace
查看集群tablespace与filespace的对应关系,以及路径
SELECT
spcname as tblspc
,fsname as filespc
,fsedbid as seg_dbid
,fselocation as datadir
FROM pg_tablespace pgts
,pg_filespace pgfs
,pg_filespace_entry pgfse
WHERE pgts.spcfsoid=pgfse.fsefsoid
AND pgfse.fsefsoid=pgfs.oid
ORDER BY tblspc, seg_dbid
查看某张表使用的tablespace
select * from pg_tables where tablename='xxxx';
select * from gp_toolkit.gp_workfile_entries :contains one row for each operator using disk space for workfiles on a segment at the current time.
select * from gp_toolkit.gp_workfile_usage_per_query : contains one row for each query using disk space for workfiles on a segment at the current time.
select * from gp_toolkit.gp_workfile_usage_per_segment : contains one row for each segment.
Each row displays the total amount of disk space used for workfiles on the segment at the current time.
需要先创建一个view
psql -d postgres -f /home/gpadmincloud/install/share/postgresql/contrib/gp_session_state.sql
然后
select * from session_state.session_level_memory_consumption
对于heap表,可以通过以下方式查看是否有膨胀
select * from gp_toolkit.gp_bloat_diag where bdirelname = '{table_name}';
select * from gp_toolkit.gp_bloat_diag limit 10;
-- 注意这个表也是离线更新的,可能不是很及时,analyze会有助于加快速度
对于AO表
select * from gp_toolkit.__gp_aovisimap_compaction_info('{table_name}'::regclass);
select * from pg_resqueue_attributes;
SELECT * FROM gp_toolkit.gp_resqueue_status;
SELECT * FROM gp_toolkit.gp_locks_on_resqueue WHERE lorwaiting='true';
alter resource queue pg_default with(ACTIVE_STATEMENTS=30)
在master节点上的install目录
export MASTER_DATA_DIRECTORY=/data/greenplum/master/gpseg-1
停止
gpstop
gpstop -M fast(所有连接直接kill掉)
启动
gpstart
在master节点上的install目录
gpconfig -c log_min_duration_statement -v 3600
gpconfig -c log_statement -v ddl(注意字符串的话需要双引号加单引号)
-- 注意如果配置的参数有错误,可能会导致之后的参数都不生效,比如gpconfig -c log_min_duration_statement -v 3600000
-- 修改后更加参数不同,生效的方式也不一样,有些需要重启gp,有些只需要重新加载配置等
有时候集群无法启动,比如修改了配置,但是配置超过了限制,如下错误
2018-07-10 09:18:16.187637 GMT,,,p18969,th1177327744,,,,0,,,seg-1,,,,,"FATAL","22023","2500 is outside the valid range for parameter ""max_prepared_transactions"" (1 .. 1000)",,,,,,,,"set_config_option","guc.c",4892
这个时候需要重新修改配置,就以安全模式只启动master(gpstart -m),然后调用gpconfig修改参数
如果要连接master,需要使用:
PGOPTIONS='-c gp_session_role=utility' psql
select pg_cancel_backend(procpid) 活跃查询
select pg_terminate_backend(procpid) idle
在standy master上,export以下2个环境变量
export MASTER_DATA_DIRECTORY=/data/greenplum/master/gpseg-1
export PGPORT=5432
./gpactivatestandby -d /data/greenplum/master/gpseg-1
配置客户端认证,只需要修改master上的pg_hba.conf,注意其中的地址使用的是CIDR格式,也就是如果指定某一个ip
需要写成.../32,然后gpstop -u
注意:gpstop命令非常耗内存(与segment个数有关),这里修改master节点hba文件,理论上只需要加载mater即可,也可以使用
pg_ctl reload -D /data/greenplum/master/gpseg-1
7. 恢复集群
gprecoverseg -Fo ./recv
gprecoverseg -Fi ./recv
To rebalance the cluster and bring all the segments into their preferred role
gprecoverseg -r
gpinitstandby -n
这个是分布式导出,会在每个segment下固定位置导出对应文件
env PGDATABASE="postgres" gp_dump -s -n public -p 20001 -U postgres
包括数据,注意old_dbname需要没有连接
CREATE DATABASE new_dbname TEMPLATE old_dbname;
需要在master上运行
copy (select * from {table_name} limit 100) to '{local_path}' csv
copy {table_name} from '{local_path}' csv;
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。