昨天的问题我把parallal_max_servers设置为0暂时解决了。
但今天上午用户又打电话说问题重现了。
今天上午又仔细的分析了一下。把结果贴出来:
问题的表像是查询同义词DBA_PENDING_TRANSACTIONS时一直持续执行状态,无返回数据,等待事件为:PX Deq:Execute Reply
同义词定义如下:
create or replace public synonym DBA_PENDING_TRANSACTIONS
for SYS.DBA_PENDING_TRANSACTIONS;
视图SYS.DBA_PENDING_TRANSACTIONS定义如下:
create or replace view sys.dba_pending_transactions as
(((select formatid, globalid, branchid
from gv$global_transaction
wherepreparecount > 0 and refcount = preparecount)
minus
(select global_tran_fmt, global_foreign_id, branch_id
from sys.pending_trans$ tran, sys.pending_sessions$ sess
wheretran.local_tran_id = sess.local_tran_id
andtran.state != 'collecting'
anddbms_utility.is_bit_set(tran.session_vector, sess.session_id)=1)
)
union
(select global_tran_fmt, global_foreign_id, branch_id
from sys.pending_trans$ tran, sys.pending_sessions$ sess
wheretran.local_tran_id = sess.local_tran_id
andtran.state != 'collecting'
anddbms_utility.is_bit_set(tran.session_vector, sess.session_id)=1)
)
上面语句后面两部分查询没有问题,第一部分查询也同样出现等待:PX Deq:Execute Reply
所以我将问题定位在视图gv$global_transaction上。
通过sql_trace跟踪,发现
select *
from
GV$GLOBAL_TRANSACTION where rownum < 2
call count cpu elapsed disk query current rows
------- -------------- ---------- ---------- ---------- --------------------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.57 0 0 0 0
Fetch 1 0.00 9.15 0 0 0 0
------- -------------- ---------- ---------- ---------- --------------------
total 3 0.01 9.74 0 0 0 0
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Rows Row Source Operation
----------------------------------------------------------
0COUNT STOPKEY (cr=0 pr=0 pw=0 time=575792 us)
0 PX COORDINATOR(cr=0 pr=0 pw=0 time=575787 us)
0 PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
0 VIEWGV$GLOBAL_TRANSACTION (cr=0 pr=0 pw=0 time=0 us)
0 FIXED TABLE FULL X$K2GTE2 (cr=0 pr=0 pw=0 time=0 us)
重开一新会话,sid为3039.对X$K2GTE2进行查询,出现等待:
select * from X$K2GTE2 where rownum < 2;
enq: DX - contention
inactive transaction branch
上面两个等待交替出现,我理解可能跟分布式事物有关。
通过对v$lock查询:
select * from v$lock where type='DX'
C000000CBD897080 C000000CBD8970A0 3057 DX 36 0 6 0 1444 1
C000000CBD8A4BF8 C000000CBD8A4C18 3058 DX 46 0 6 0 1381 0
C000000CBD8A4C90 C000000CBD8A4CB0 3039 DX 36 0 0 4 6 0
根据我的理解,是会话3057持有该锁,3039正在等待。
查看会话3057执行的语句:
select sql_text
from v$sqlarea
where hash_value = (select sql_hash_value from v$session where sid = 3057)
整理如下:
INSERT INTO STATQ_AGENT_CARD
SELECT :B1,
:B5,
MAX(ARB.AGENTID),
MAX(P.RECDEFID),
P.FORMNUM,
AR.RESID,
MAX(AR.SERVNUMBER),
MAX(TO_CHAR(P.RECDATE, 'YYYYMMDD')) RECDATE,
MAX(OP.OPERNAME) RECOPNAME,
MAX(NAME) RECORGNAME,
MAX(P.CUSTNAME) CUSTNAME,
SUM(DECODE(AC.FEEID,
'SIM',
AC.FEE - AC.DISCOUNT,
'UIM',
AC.FEE - AC.DISCOUNT,
0)) SIMFEE,
SUM(DECODE(AC.FEEID, 'TEL', AC.FEE - AC.DISCOUNT, 0)) CHSTELFEE,
SUM(DECODE(AC.FEEID, 'Prepay', AC.FEE - AC.DISCOUNT, 0)) CHPREFEE,
SUM(DECODE(AC.FEEID, 'Dep', AC.FEE - AC.DISCOUNT, 0)) DEPFEE,
SUM(NVL(AC.FEE - AC.DISCOUNT, 0)) HJFEE
FROM TBCS.RECEPTIONP,
TBCS.AGENT_RESAR,
TBCS.AGENT_RES_BASE ARB,
TBCS.BAT_INSTALL_INDIV_CHARGE AC,
TBCS.OPERATOROP,
ANIZATIONORG
WHERE P.REGION = :B1
AND P.RECORGID LIKE :B4
AND P.RECDATE >= :B3
AND P.RECDATE < :B2
AND AR.REGION = :B1
AND P.OID = AR.RECID
AND P.RECDEFID = 'AgentGetCard'
AND P.RECOPID = OP.OPERID
AND P.RECORGID = ID
AND AR.REGION = AC.REGION(+)
AND AR.OID = AC.INSTALLRECID(+)
AND AR.RE
对上面涉及到的表进行查看,都是本地表,没有dblink。
3057是一套报表系统,很多会话都在执行这同一条语句。每一时刻都有3-5个会话在执行这同一条语句。
为了确定问题,我联系了业务人员,问是否可以先将报表系统停掉。得到用户的许可。
停掉报表系统后果然症状消失。为了验证,我又让开发人员开了一个报表进程执行这条语句,问题果然又重现。可以肯定问题就在这儿了。
但我不明白的是,上面执行的这个句子跟表X$K2GTE2有什么关系?原理是什么?