How to implement/modify opera_beam_search()
It is important to assign the key_position parameter in input parameter as this determines which range of input tokens are considered within the local window of cross attention!
Abstract
Hallucination often relates to knowledge aggregation patterns manifested in the self-attention matrix (LLM only focuses on few summary tokens not all previous tokens)
OPERA introduces a penalty term on model logits during beam-search decoding along with rollback strategy to retrospect the presence of summary tokens in the previously generated tokens
Introduction
Recurring pattern of hallucination after a columnar attention pattern
Some tokens serve as summary tokens (often called as anchor token) which let LLM to aggregate previous info on a few anchor tokens at shallow layers and predict next token based on these anchors at deep layers
In MLLM, vision tokens are inputted first but vision info diminishes during the transmission of information between summary tokens
By giving over-trust penalty from OPERA model, candidate from Beam Search with over-trust pattern will unlikely to be selected.
Rollback strategy: retrospection is triggered when the location overlap of the maximum of in-window penalty scores reaches a threshold.
Method
MLLM Input formulation
MLLM Model Forward
MLLM Decoding
Over-Trust Logit Penalty
We consider to gather all of previous self-attention weights in a local window for characterizing the knowledge pattern
Do preprocessing of filling the upper triangle of the matrix with zeros and scaling up the attention values
Then, conduct the column wise multiplication on the lower triangle of attention matrix and obtain a vector of column-wise scores
Lastly, choose the top N_can in the logit of each beam to consist candidat set y
Retrospection-Allocation Strategy