Title Year Abstract/Limitation Summary
ELM(Embedding Language Model) Mar 2024(Ver2) Making embeddings more interpretable by employing LLM - transforming abstract vectors into understandable narratives

Task:

  1. Enhancing concept activation vectors *(XAI 의 일종으로 vectors in high dim space of NN that represent a particular concept)
  2. Communicating novel embedded entities
  3. Decoding user preferences in recommender systems

Limitations

  1. Lack of experiments: only movie and amazon data
  2. No baseline
  3. 24 tasks are all original/tedious | Embedding representation 에 대한 자세한 해석이 필요했다. 추천시스템 분야에서 item embeddings may implicity embody details about quality, usability, design etc. 혹은 특정 embedding point의 존재하진 않지만 (hypothetical item) properties를 알고 싶을 때! 그동안은 불가능 했다.

예시) 포레스트검프와 인셉션 사이에 있는 영화가 가질 possible한 특징들, 더 코미디 버젼의 포레스트검프, 애니메이션 버젼의 포레스트 검프 를 나타내는 embedding 가능/포착 가능

주 목적: introducing a novel framework to interpret domain embeddings

주 방법: training adapter layers to map domain embedding vectors into the token-level embedding space of an LLM.

Reinforcement learning from AI feedback도 사용함

Training into two stages. In the first stage, we train the adapter EA on tasks in T by keeping all other parameters (E0, M0) frozen. Since M0 is pretrained, the learned first-stage mapping from W to Z improves convergence in the next stage. In the second stage, we fine-tune the full model by training all parameters (E0, M0, EA).

Embeddings Encoding 방식: To train behavioral embeddings, we use matrix factorization (MF) computed using weighted alternating least squares (WALS) Semantic embeddings are the second type, and are generated using textual descriptions of movies. Specifically, we use a pretrained dual-encoder language model (DLM) similar to Sentence-T5 and generalizable T5-based dense retrievers. More specifically, we concatenate plot descriptions and reviews for each movie, and input these to the DLM. We then average the resulting output vectors to generate the semantic embeddings. | | | | | | | |

*How CAVs Work

  1. Defining Concepts: First, you define the concept you're interested in. This involves gathering a set of examples that embody the concept (positive examples) and, optionally, a set of examples that do not (negative examples).
  2. Training a Linear Model: You then train a simple linear classifier (e.g., logistic regression) using the activations of a layer in your neural network as features. This classifier is trained to distinguish between the activations produced by your positive and negative examples. The weight vector of this classifier, which represents the boundary between the concept and non-concept activations, serves as the Concept Activation Vector for your defined concept.
  3. Interpreting the Influence of Concepts: Once you have a CAV for a concept, you can use it to interpret the model's decisions. This is typically done by measuring the sensitivity of the model's output to changes in the direction of the CAV. If moving along the CAV in the activation space significantly changes the output, it suggests that the concept captured by the CAV is important for the model's decisions.