Instructions to use TechWolf/JobBERT-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use TechWolf/JobBERT-v2 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("TechWolf/JobBERT-v2") sentences = [ "Program Coordinator RN", "discuss the medical history of the healthcare user, evidence-based approach in general practice, apply various lifting techniques, establish daily priorities, manage time, demonstrate disciplinary expertise, tolerate sitting for long periods, think critically, provide professional care in nursing, attend meetings, represent union members, nursing science, manage a multidisciplinary team involved in patient care, implement nursing care, customer service, work under supervision in care, keep up-to-date with training subjects, evidence-based nursing care, operate lifting equipment, follow code of ethics for biomedical practices, coordinate care, provide learning support in healthcare", "provide written content, prepare visual data, design computer network, deliver visual presentation of data, communication, operate relational database management system, ICT communications protocols, document management, use threading techniques, search engines, computer science, analyse network bandwidth requirements, analyse network configuration and performance, develop architectural plans, conduct ICT code review, hardware architectures, computer engineering, video-games functionalities, conduct web searches, use databases, use online tools to collaborate", "nursing science, administer appointments, administrative tasks in a medical environment, intravenous infusion, plan nursing care, prepare intravenous packs, work with nursing staff, supervise nursing staff, clinical perfusion" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Inference
- Notebooks
- Google Colab
- Kaggle
Model outputs 768 dim embeddings instead of 1024 as mentioned
Hello,
I'm trying out the JobBERT-v2 model from the sentence-transformers library. According to the documentation and my understanding, this model is supposed to output 1024-dimensional embeddings. However, during inference, I'm receiving 768-dimensional embeddings.
I suspect that the Asym layer is primarily designed for training scenarios where embeddings like "anchor" and "positive" are compared or contrasted. During inference, using such layers without the corresponding training dynamics might not yield the expected transformations.
Is the Asym layer intended only for training purposes in the JobBERT-v2 model? or if I am doing it wrong?
Model Name: jensjorisdecorte/JobBERT-v2
Library Versions:
sentence-transformers: 3.1.0
transformers: 4.44.2
torch: 2.4.1+cu118
Python Version: 3.8
Device: CUDA
Thanks,
Hi @Bhanu3 ,
By default, the data is not passed through the Asym layer at inference. This is due to the design of the sentence-transformers package. To make sure that job titles are passed through the right Asym layer, please follow the code example in the readme:
import torch
import numpy as np
from tqdm.auto import tqdm
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import batch_to_device, cos_sim
# Load the model
model = SentenceTransformer("TechWolf/JobBERT-v2")
def encode_batch(jobbert_model, texts):
features = jobbert_model.tokenize(texts)
features = batch_to_device(features, jobbert_model.device)
features["text_keys"] = ["anchor"]
with torch.no_grad():
out_features = jobbert_model.forward(features)
return out_features["sentence_embedding"].cpu().numpy()
def encode(jobbert_model, texts, batch_size: int = 8):
# Sort texts by length and keep track of original indices
sorted_indices = np.argsort([len(text) for text in texts])
sorted_texts = [texts[i] for i in sorted_indices]
embeddings = []
# Encode in batches
for i in tqdm(range(0, len(sorted_texts), batch_size)):
batch = sorted_texts[i:i+batch_size]
embeddings.append(encode_batch(jobbert_model, batch))
# Concatenate embeddings and reorder to original indices
sorted_embeddings = np.concatenate(embeddings)
original_order = np.argsort(sorted_indices)
return sorted_embeddings[original_order]
# Example usage
embeddings = encode(model, [...])
# Calculate cosine similarity matrix
similarities = cos_sim(embeddings, embeddings)
print(similarities)