Part 4 - Hands-On with Ollama

Hands-On with Ollama

Running Your First AI Model

You’ve installed Ollama and downloaded your models - now comes the moment of truth. It’s time to bring your personal AI assistant to life and see what it can do. This is where theory turns into practice, and you begin to experience the power of having AI running locally on your own machine.

Basic Interaction: Your First Conversation

Let’s start with the simplest way to interact with your model. Open your terminal and type:

ollama run llama3.2

After a brief loading period (the first run takes longer as the model loads into memory), you’ll see a welcome message and a prompt. Type anything and press Enter:

> What are three interesting applications of large language models?

Just like that, you’re having a conversation with a sophisticated AI running entirely on your own hardware! The model will generate a thoughtful response about AI applications without sending your query to any external service.

When you’re done chatting, type /exit or press Ctrl+D to end the session.

Command-Line Options for Better Results

The basic command works, but you can significantly improve your experience with a few simple options:

ollama run llama3.2 --verbose

The --verbose flag shows you what’s happening behind the scenes, including:

  • Token generation speed (tokens/second)
  • Memory usage
  • Total tokens in the conversation

One-Shot Queries: Quick Answers Without Conversation

Sometimes you just need a quick answer without starting a whole conversation. Ollama makes this easy:

ollama run llama3.2 "Explain quantum computing in simple terms"

This runs the model, processes your query, returns the response, and exits - perfect for scripts or quick information needs.

Context and Conversation Management

Large language models shine when they can maintain context. Ollama manages this for you in interactive sessions, but you can also provide a system prompt by creating a Modelfile:

# Create a Modelfile with a custom system prompt
cat << EOF > Modelfile
FROM llama3.1
SYSTEM "You are a knowledgeable research assistant. Provide factual, accurate information with citations where possible. If you're unsure about something, acknowledge the limitations of your knowledge."
EOF

# Create a custom model with this system prompt
ollama create knowledge-assistant -f Modelfile

# Run your custom model
ollama run knowledge-assistant

This approach creates a custom model variant with your specified system prompt. You can then use this model whenever you need that particular context.

For knowledge base applications, this gives you a consistent, specialized assistant that’s pre-configured for your information retrieval needs.

Performance Optimization in Action

To optimize your model’s performance, you’ll need to create a custom model with specific parameters. Here’s how to compare performance between different settings:

# Create a standard model
cat << EOF > StandardModel
FROM llama3.2
EOF

# Create an optimized model
cat << EOF > OptimizedModel
FROM llama3.2
PARAMETER num_ctx 512
PARAMETER seed 42
EOF

# Create both models
ollama create standard-model -f StandardModel
ollama create optimized-model -f OptimizedModel

# Compare performance
time ollama run standard-model "Write a short poem about AI" --verbose
time ollama run optimized-model "Write a short poem about AI" --verbose

The optimized model uses:

  • num_ctx: A smaller context window (using less memory)
  • seed: Fixed seed for reproducible responses

On a typical system, you might see generation speed improve with these optimizations, especially on machines with limited resources.

Real-World Example: Building a Simple Knowledge Query

Let’s put it all together with a practical example for our knowledge base system:

# First, create a specialized model for research queries
cat << EOF > ResearchAssistant
FROM llama3.1
SYSTEM "You are a research assistant helping to analyze scientific information. Provide accurate, concise answers."
PARAMETER temperature 0.2
EOF

# Create the custom model
ollama create research-assistant -f ResearchAssistant

# Now run a query
ollama run research-assistant "What are the primary mechanisms behind climate change, and what evidence supports these mechanisms? Organize your answer with clear headings."

This approach:

  1. Creates a specialized version of the 8B parameter Llama 3 model
  2. Sets a system prompt to frame all interactions
  3. Configures a low temperature for factual responses
  4. Uses the custom model to ask a complex knowledge-based question
  5. Requests specific formatting (headings)

The result will be a well-structured, factual response generated entirely on your local machine!

Troubleshooting Common Issues

If you encounter slow responses or out-of-memory errors:

  1. For slow responses: Try a smaller or more quantized model:

     ollama run llama3.2:1b
    
  2. For memory errors: Reduce context size by updating your Modelfile.

     PARAMETER num_ctx 512
    

Taking It Further

Once you’re comfortable with basic interactions, you can start exploring more advanced uses:

  • Save conversations: Redirect output to files with ollama run llama3 "Your query" > response.txt
  • Batch processing: Create scripts that process multiple queries
  • Integration with other tools: Pipe information between Ollama and other command-line tools

With your model up and running, you’ve crossed the threshold from setup to actual use. Your personal AI lab is now operational! In our next post, we’ll explore how to build a more structured knowledge base system that can intelligently answer questions from your stored information.

Notes mentioning this note