Latest

Thoughts on software development and life.

  • Published on
    Most AI applications, whether a RAG based chatbot or a simple model wrapper, rely on prompts to generate responses. User or application inputs are converted into prompts, which are then fed to the underlying model to generate responses. Due to the nature of the models, the quality of response is highly dependent on the quality of the prompt. Of course, we can manually test prompts to some extent, but it's not scalable. In this article I will discuss Latitude - a prompt engineering platform that can help in refining prompts, A/B testing them and measuring their performance.
  • Published on
    Last week I blogged about how Quantization can help you run your models on lower-powered hardware. In todays blog, I am extending the discussion further, talking about ONNX (Open Neural Network Exchange), which provides a standard format for representing machine learning models. This enables interoperability between frameworks and simplifies deployment across diverse hardware, including browser-based inference with onnxruntime-web. I have also included a demo to run a model in the browser.
  • Published on
    When storing data in memory, the data type used to represent the data has an impact on the memory usage and the performance of the overall system. Consider saving a number. On a high level, the number can either be an integer (whole number) or a floating-point number (number with decimal). Floating-point numbers can represent larger range of numbers with higher precision. Weights and biases in a large language model, which are learned during training and are used to make predictions, are stored as floating-point numbers to maintain high precision. The count of these parameters is what constitutes the size of the model, memory usage and how much computational resources are needed to run the model. In this post, we will discuss how quantization can be used to reduce the memory usage of models and improve performance (assuming the loss of precision is acceptable).
  • Published on
    In one of my previous articles, I discussed why and how to adopt Infrastructure as Code (IaC) to manage your cloud infrastructure efficiently. There are several tools and frameworks available for IaC, most notably Terraform, Pulumi, Ansible, Puppet, etc. These tools allow you to define and manage your infrastructure as code, enabling automation, repeatability, and scalability in your cloud environment. In this article I want to discuss OpenTofu - an open-source alternative to Terraform that has gained popularity recently.