Skip to content
Snippets Groups Projects
Yin Xi's avatar
Yin Xi authored
7aed8ab7

Table of content

Resource

open LLM Leaderboard

ollama API docs

ollama model library

where to setup ollama

There are couple ways to run Ollama API for different purpose

purpose where to setup who can use it active time docker or native install access
self test or small single job any computer local user only unlimited both works, native install is slightly easier curl localhost:11434
large sigle job biohpc with GPU any computer on UTSW network, including VPN 20 hours already installed localhost:11434 from biohpc terminal or with cluster IP address:11434 on UTSW network including VPN
production any computer on campus with wired connection running 24/7 preferrably with GPU any computer on UTSW network, including VPN unlimited docker only, port cannot be accessed via ip address if installed natively ip address:11434

in many situations, the ollama command does not work (e.g. ollama pull llama2), alternatively, use curl to make modifications

$ curl --noproxy '*' http://localhost:11434/api/pull -d '{ "name": "llama2"}

the --noproxy '*' is necessary in biohpc to avoid being block by the firewall.

how to setup ollama

setup ollama in biohpc

  1. start with web terminal, on-demand jupyterLab (with GPU), or on-deman rstudio

  2. in terminal, you can check available ollama version by $ module avail ollama

you should see something likeollama/0.1.20-img

  1. load ollama $ module load ollama/0.1.20-img

  2. start ollama $ ollama serve &

or start ollama using cutomized model storage

OLLAMA_MODELS=/work/radiology/yxi/ollama ollama serve &

  1. use $ ifconfig to get the ip address for use on other computer on campus/VPN

it takes a while...

setup ollama on your own machine

for local use, just follow official instruction

to run as server

$ docker pull ollama
$ docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

common useage

check available LLMs

$ curl http://localhost:11434/api/tags $ curl --noproxy '*' http://localhost:11434/api/tags

pull models

$ curl http://localhost:11434/api/pull -d '{ "name": "llama3"}

$ curl --noproxy '*' http://localhost:11434/api/pull -d '{ "name": "llama3"}

Install ANY GGUF model from Huggingface

  1. download GGUF file from huggingface using wget and save to \blobs, recommended quantization types are q4_K_M or q8_0 ref
  2. run $ sha256sum blablabla.gguf and get the sha256 value
  3. rename blablabla.gguf as sha256-...
  4. alternatively, run $ curl --noproxy '*' -T blablabla.gguf -X POST localhost:11434/api/blobs/sha256:...
  5. run this in terminal $ url --noproxy '*' localhost:11434/api/create -d '{"model":"blablabla","files":{"blablabla.gguf": "sha256-..."}}'
  6. model should show up in api/tags

test model connection

$ curl 129.112.204.51:11434/api/generate -d '{"model": "llama3","prompt": "how are you", "stream": false}'

example in R

library(curl)
library(jsonlite)

changePrompt <- function(new_system,new_prompt) {
  data <- sprintf('{
    "model": "llama2",
    "prompt": "%s",
    "temperature": 0.1,
    "system": "%s",
    "stream": false
  }', new_prompt,new_system)
  return(data)
}

ask_llama2 <- function(text,sys){
  url <- "http://pedrosa-all-series.dhcp.swmed.org:11434/api/generate"
  
  # Send a POST request using curl
  response <- curl_fetch_memory(
    url = url,
    handle = new_handle(
      post = TRUE,
      postfields = new_data <- changePrompt(sys,text)
    )
  )
  
  # Extract the response content
  response_content <- rawToChar(response$content)
  fromJSON(response_content)$response
}

# example
ask_llama2(text = "how are you?", sys = 'day dreamer')