ToMMeR-Llama-3.2-1B_L13_R64

ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks.

Model Details

This model can be plugged at layer 13 of meta-llama/Llama-3.2-1B, with a computational overhead not greater than an additional attention head.

Property	Value
Base LLM	`meta-llama/Llama-3.2-1B`
Layer	13
#Params	264.2K

Usage

Installation

To use ToMMeR, you need to install its codebase first.

pip install git+https://github.com/VictorMorand/llm2ner.git

Raw inference

By default, ToMMeR outputs span probabilities, but we also propose built-in options for decoding entities.

Inputs:
- tokens (batch, seq): tokens to process,
- model: LLM to extract representation from.
Outputs: (batch, seq, seq) matrix (masked outside valid spans)

from xpm_torch.huggingface import TorchHFHub
from llm2ner import ToMMeR

tommer: ToMMeR = TorchHFHub.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L13_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)

#### Raw Inference
text = ["Large language models are awesome"]
print(f"Input text: {text[0]}")

#tokenize in shape (1, seq_len)
tokens = model.tokenizer(text, return_tensors="pt")["input_ids"].to(device)
# Output raw scores
output = tommer.forward(tokens, model) # (batch_size, seq_len, seq_len)
print(f"Raw Output shape: {output.shape}")

#use given decoding strategy to infer entities
entities = tommer.infer_entities(tokens=tokens, model=model, threshold=0.5, decoding_strategy="greedy")
str_entities = [ model.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]]
print(f"Predicted entities: {str_entities}")

>>> Input text: Large language models are awesome
>>> Raw Output shape: torch.Size([1, 6, 6])
>>> Predicted entities: ['Large language models']

Fancy Outputs

We provide

Please visit the repository for more details and a demo notebook.

from xpm_torch.huggingface import TorchHFHub
from llm2ner import ToMMeR

tommer: ToMMeR = TorchHFHub.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L13_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)

text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). "

#fancy interactive output
outputs = llm2ner.plotting.demo_inference( text, tommer, llm,
    decoding_strategy="threshold",  # or "greedy" for flat segmentation
    threshold=0.5, # default 50%
    show_attn=True,
)

Large PRED language PRED models are awesome . While trained on language PRED modeling , they exhibit emergent PRED abilities that make them suitable for a wide range of tasks PRED , including Named PRED Entity Recognition ( NER PRED ) .

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for llm2ner/test-model

Base model

meta-llama/Llama-3.2-1B

Finetuned

(899)

this model

Paper for llm2ner/test-model

ToMMeR -- Efficient Entity Mention Detection from Large Language Models

Paper • 2510.19410 • Published Oct 22, 2025 • 1