Baidyanath Prasad - Matcha: Building a Local-First AI Resume

Matcha: Building a Local-First AI Resume–JD Matching Engine with Spring AI

- Baidyanath Prasad on February 26, 2026

Introduction

Building an AI application as a backend developer no longer requires pivoting to a new language or managing complex cloud infrastructure. By leveraging Spring AI, you can treat a Large Language Model (LLM) as just another service in your ecosystem.

Matcha was prototyped and polished in just 3–4 hours. This speed is possible because Spring AI abstracts the "AI complexity" into familiar POJO-based patterns, allowing for rapid iterations—tuning prompts and refining logic in minutes rather than days.

To ensure a systematic engineering defense of the architecture, I applied the S.C.A.L.E. Framework. This framework turns the chaos of open-ended design into a structured, defensible plan by focusing on trade-offs rather than just components.

S: Scope and Size

Let's begin by defining the Requirements (The MVP) and then calculating the Constraints. This defines the project boundary for a local-first recruitment tool.

Functional Requirement (FR):

A user can upload a resume (PDF)
A user can input a job description (JD) link from any career website (e.g., LinkedIn, Greenhouse)
On clicking "Analyze", the server extracts text from both the resume and the JD.
The server sends both texts to the locally running Llama3.2 model for analysis.
If the score exceeds a threshold, the server sends an email notification.
Resume is cached for the session, allowing users to try another JD without re-uploading.

Non-Functional Requirement(Non-FR):

The server must handle all possible exceptions gracefully.
Resume caching should prevent repetitive uploads for the same session.
The entire system runs locally (no external cloud dependencies).

C: Component Topology

The system skeleton is built as a series of decoupled services to maintain separation of concerns based on data type and action.

Frontend WebApp: A normal webapp built using HTML, CSS, JavaScript, and AJAX.
Backend Server: The Spring Boot core manages the REST API and business logic.
Scrap Resume (PDF): A service utilizing Apache Tika to extract text from binary documents.
Scrap JD (Web): A Jsoup-based module that fetches and cleans job posting HTML.
ModelClient: The Spring AI bridge communicating with the local inference server via high-speed IPC.
Deserialization: A layer that maps non-deterministic AI strings into typed Kotlin data classes.
Email Service: An SMTP notifier triggered automatically by the score validation logic.

A: Algorithmic Deep Dive

This section addresses specific bottlenecks and solves them with logical patterns.

Context Window Management: To ensure stability, the system verifies that the combined extracted words from the JD and resume do not exceed a total of 12,000 words. If they do, the text is intelligently scraped to this limit before being sent to the AI to prevent processing failure.
Notification Logic: The system acts as a deterministic gate. It evaluates the AI's response and sends an email only if the score is greater than or equal to the threshold. 📧

L: Load Optimization

Handling growth and hardware spikes is critical for a smooth user experience.

Hardware Efficiency: The entire setup is designed to run on a self-owned machine.
Performance Benchmark: Optimized to avoid hardware "hiccups," this was tested on an M1 Pro with 8GB RAM and works perfectly fine. 💻

E: Evaluation & Errors

Final "Sanity Checks" ensure the design survives failures and validates the initial constraints.

Error Strategy (Resilience): The system proactively validates against errors and edge cases (such as malformed URLs or empty PDFs) to ensure a user experience remains stable.
Validation: Every release is evaluated against all defined Functional Requirements (FR) to ensure the end-to-end flow from "Analyze" to "Email notification" remains robust.

Closing Thoughts

As demonstrated by Matcha, the transition from traditional backend engineering to AI-enabled development is about mastering the orchestration of local models within a structured framework like S.C.A.L.E. By keeping inference local, we prioritize data privacy and eliminate cloud costs without sacrificing the intelligence of the application.

Explore the project on GitHub:

https://github.com/baidyanathprasad/matcha

🚀

Search This Blog

Engineering Leader | Mentor | Blogger

Matcha: Building a Local-First AI Resume–JD Matching Engine with Spring AI

Introduction

S: Scope and Size

C: Component Topology

A: Algorithmic Deep Dive

L: Load Optimization

E: Evaluation & Errors

Closing Thoughts

Comments

Post a Comment

Popular posts from this blog

My Journey: From Village Schools to Engineering Leadership

Redis Optimization: How Local Caching Unlocked 10x Scalability

2026: The Year Your Job Becomes a Startup

Engineering Leadership: Why Ambiguity is More Dangerous Than Complexity

Scale or Signal? How One Config Change Rewrites Kafka's Behaviour

Cassandra’s Identity Crisis: One Database, Three Personalities

Beyond the LGTM: The V.E.C.T.O.R. Framework for High-Scale Code Review

The S.C.A.L.E. Framework: Designing a Streaming Giant (Case Study: Spotify)

The Physics of Databases (Part 1): How Storage Engines Secretly Dictate Your System Design