Home >Backend Development >Python Tutorial >What are the options for running LLM locally using pre-trained weights?

What are the options for running LLM locally using pre-trained weights?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBforward
2024-02-22 12:34:09656browse

What are the options for running LLM locally using pre-trained weights?

Question content

I have a cluster that is not connected to the internet, although there is a weight repository available. I need to run LLM inference on it.

The only option I've found so far is to use a combination of the transformers and langchain modules, but I don't want to tune the model's hyperparameters. I came across ollama software but I can't install anything on the cluster except the python library. So, naturally I wondered, what are the options for running LLM inference? There are still some questions.

  1. Can I just install the ollama-python package without installing their Linux software? Or do I need both to run my reasoning?
  2. If I manage to install ollama on this cluster, how can I provide the model with pretrained weights? If it helps, they are stored in (sometimes multiple) .bin files

Correct Answer


You don't actually have to install ollama. Instead, you can run llm directly locally, e.g. mistral model

llm = gpt4all(
    model="/home/jeff/.cache/huggingface/hub/gpt4all/mistral-7b-openorca.q4_0.gguf",
    device='gpu', n_threads=8,
    callbacks=callbacks, verbose=true)

or for falcon

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_id = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipeline = pipeline(
    "text-generation",
    model=model_id,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    # trust_remote_code=True,
    device_map="auto",
    max_new_tokens=100,
    # max_length=200,
)


from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=pipeline)

I have 16g memory nvidia 4090 installed on my laptop, which can support the above 2 models to run locally.

The above is the detailed content of What are the options for running LLM locally using pre-trained weights?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete