In most cases, you do not need to call [`~generation.GenerationMixin.beam_search`] directly. Use generate() instead. For an overview of generation strategies and code examples, check the [following guide](../generation_strategies). Parameters: input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): The sequence used as a prompt for the generation. beam_scorer (`BeamScorer`): An derived instance of [`BeamScorer`] that defines how beam hypotheses are constructed, stored and sorted during generation. For more information, the documentation of [`BeamScorer`] should be read. logits_processor (`LogitsProcessorList`, *optional*): An instance of [`LogitsProcessorList`]. List of instances of class derived f"> In most cases, you do not need to call [`~generation.GenerationMixin.beam_search`] directly. Use generate() instead. For an overview of generation strategies and code examples, check the [following guide](../generation_strategies). Parameters: input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): The sequence used as a prompt for the generation. beam_scorer (`BeamScorer`): An derived instance of [`BeamScorer`] that defines how beam hypotheses are constructed, stored and sorted during generation. For more information, the documentation of [`BeamScorer`] should be read. logits_processor (`LogitsProcessorList`, *optional*): An instance of [`LogitsProcessorList`]. List of instances of class derived f"> In most cases, you do not need to call [`~generation.GenerationMixin.beam_search`] directly. Use generate() instead. For an overview of generation strategies and code examples, check the [following guide](../generation_strategies). Parameters: input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): The sequence used as a prompt for the generation. beam_scorer (`BeamScorer`): An derived instance of [`BeamScorer`] that defines how beam hypotheses are constructed, stored and sorted during generation. For more information, the documentation of [`BeamScorer`] should be read. logits_processor (`LogitsProcessorList`, *optional*): An instance of [`LogitsProcessorList`]. List of instances of class derived f">
"""
Generates sequences of token ids for models with a language modeling head using **beam search decoding** and
can be used for text-decoder, text-to-text, speech-to-text, and vision-to-text models.

<Tip warning={true}>

In most cases, you do not need to call [`~generation.GenerationMixin.beam_search`] directly. Use generate()
instead. For an overview of generation strategies and code examples, check the [following
guide](../generation_strategies).

</Tip>

Parameters:
    input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
        The sequence used as a prompt for the generation.
    beam_scorer (`BeamScorer`):
        An derived instance of [`BeamScorer`] that defines how beam hypotheses are constructed, stored and
        sorted during generation. For more information, the documentation of [`BeamScorer`] should be read.
    logits_processor (`LogitsProcessorList`, *optional*):
        An instance of [`LogitsProcessorList`]. List of instances of class derived from [`LogitsProcessor`]
        used to modify the prediction scores of the language modeling head applied at each generation step.
    stopping_criteria (`StoppingCriteriaList`, *optional*):
        An instance of [`StoppingCriteriaList`]. List of instances of class derived from [`StoppingCriteria`]
        used to tell if the generation loop should stop.
    max_length (`int`, *optional*, defaults to 20):
        **DEPRECATED**. Use `logits_processor` or `stopping_criteria` directly to cap the number of generated
        tokens. The maximum length of the sequence to be generated.
    pad_token_id (`int`, *optional*):
        The id of the *padding* token.
    eos_token_id (`Union[int, List[int]]`, *optional*):
        The id of the *end-of-sequence* token. Optionally, use a list to set multiple *end-of-sequence* tokens.
    output_attentions (`bool`, *optional*, defaults to `False`):
        Whether or not to return the attentions tensors of all attention layers. See `attentions` under
        returned tensors for more details.
    output_hidden_states (`bool`, *optional*, defaults to `False`):
        Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
        for more details.
    output_scores (`bool`, *optional*, defaults to `False`):
        Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
    return_dict_in_generate (`bool`, *optional*, defaults to `False`):
        Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
    synced_gpus (`bool`, *optional*, defaults to `False`):
        Whether to continue running the while loop until max_length (needed for ZeRO stage 3)
    model_kwargs:
        Additional model specific kwargs will be forwarded to the `forward` function of the model. If model is
        an encoder-decoder model the kwargs should include `encoder_outputs`.

Return:
    [`generation.BeamSearchDecoderOnlyOutput`], [`~generation.BeamSearchEncoderDecoderOutput`] or
    `torch.LongTensor`: A `torch.LongTensor` containing the generated tokens (default behaviour) or a
    [`~generation.BeamSearchDecoderOnlyOutput`] if `model.config.is_encoder_decoder=False` and
    `return_dict_in_generate=True` or a [`~generation.BeamSearchEncoderDecoderOutput`] if
    `model.config.is_encoder_decoder=True`.

Examples:

```python
>>> from transformers import (
...     AutoTokenizer,
...     AutoModelForSeq2SeqLM,
...     LogitsProcessorList,
...     MinLengthLogitsProcessor,
...     BeamSearchScorer,
... )
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")

>>> encoder_input_str = "translate English to German: How old are you?"
>>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids

>>> # lets run beam search using 3 beams
>>> num_beams = 3
>>> # define decoder start token ids
>>> input_ids = torch.ones((num_beams, 1), device=model.device, dtype=torch.long)
>>> input_ids = input_ids * model.config.decoder_start_token_id

>>> # add encoder_outputs to model keyword arguments
>>> model_kwargs = {
...     "encoder_outputs": model.get_encoder()(
...         encoder_input_ids.repeat_interleave(num_beams, dim=0), return_dict=True
...     )
... }

>>> # instantiate beam scorer
>>> beam_scorer = BeamSearchScorer(
...     batch_size=1,
...     num_beams=num_beams,
...     device=model.device,
... )

>>> # instantiate logits processors
>>> logits_processor = LogitsProcessorList(
...     [
...         MinLengthLogitsProcessor(5, eos_token_id=model.config.eos_token_id),
...     ]
... )

>>> outputs = model.beam_search(input_ids, beam_scorer, logits_processor=logits_processor, **model_kwargs)

>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Wie alt bist du?']
'''"""