Matcha: Building a Local-First AI Resume–JD Matching Engine with Spring AI

Introduction

Building an AI application as a backend developer no longer requires pivoting to a new language or managing complex cloud infrastructure. By leveraging Spring AI, you can treat a Large Large Model (LLM) as just another service in your ecosystem.

Matcha was prototyped and polished in just 3–4 hours. This speed is possible because Spring AI abstracts the "AI complexity" into familiar POJO-based patterns, allowing for rapid iterations—tuning prompts and refining logic in minutes rather than days.

To ensure a systematic engineering defense of the architecture, I applied the S.C.A.L.E. Framework. This framework turns the chaos of open-ended design into a structured, defensible plan by focusing on trade-offs rather than just components.

S: Scope and Size

Let's begin by defining the Requirements (The MVP) and then calculating the Constraints. This defines the project boundary for a local-first recruitment tool.

Functional Requirement (FR):

  1. The user can upload a resume (PDF)
  2. The user can input a job description (JD) link from any career website (e.g., LinkedIn, Greenhouse)
  3. On clicking "Analyze", the server extracts text from both the resume and the JD.
  4. The server sends both texts to the locally running Llama3.2 model for analysis.
  5. If the score exceeds a threshold, the server sends an email notification.
  6. Resume is cached for the session, allowing users to try another JD without re-uploading.

Non-Functional Requirement(Non-FR):

  1. The server must handle all possible exceptions in a graceful manner.
  2. Resume caching should prevent repetitive uploads for the same session.
  3. The entire system runs locally (no external cloud dependencies).

C: Component Topology

The system skeleton is built as a series of decoupled services, separating concerns based on data type and action.

  • Frontend WebApp: A React-based interface where the user interacts with the system.

  • Backend Server: The Spring Boot core manages the REST API and business logic.

  • Scrap Resume (PDF): A service utilizing Apache Tika to extract text from binary documents.

  • Scrap JD (Web): A Jsoup-based module that fetches and cleans job posting HTML.

  • ModelClient: The Spring AI bridge communicating with the local inference server via high-speed IPC.

  • Deserialization: A layer that maps non-deterministic AI strings into typed Kotlin data classes.

  • Email Service: An SMTP notifier triggered automatically by the score validation logic.


A: Algorithmic Deep Dive

This section addresses specific bottlenecks and solves them with logical patterns.

  1. Context Window Management: To ensure stability, the system verifies that the combined extracted words from the JD and resume do not exceed a 12,000-word limit. If they do, the text is intelligently scraped to this limit before being sent to the AI to prevent processing failure.

  2. Notification Logic: The system acts as a deterministic gate. It evaluates the AI's response and sends an email only if the score is equal to or above the threshold. 📧

L: Load Optimization

Handling growth and hardware spikes is critical for a smooth user experience.

  • Hardware Efficiency: The entire setup is designed to run on a self-owned machine.

  • Performance Benchmark: Optimized to avoid hardware "hiccups," this was tested on an M1 Pro with 8GB RAM and works perfectly fine. 💻

E: Evaluation & Errors

Final "Sanity Checks" ensure the design survives failures and validates the initial constraints.

  • Error Strategy (Resilience): The system proactively validates against errors and edge cases (such as malformed URLs or empty PDFs) to ensure the user experience remains stable.

  • Validation: Every release is evaluated against all defined Functional Requirements (FR) to ensure the end-to-end flow from "Analyze" to "Email notification" remains robust.

Closing Thoughts

As demonstrated by Matcha, the transition from traditional backend engineering to AI-enabled development is about mastering the orchestration of local models within a structured framework like S.C.A.L.E. By keeping inference local, we prioritize data privacy and eliminate cloud costs without sacrificing the intelligence of the application.

Explore the project on GitHub: https://github.com/baidyanathprasad/matcha 🚀

Comments

Popular Posts

My Journey: From Village Schools to Engineering Leadership

Redis Optimization: How Local Caching Unlocked 10x Scalability

2026: The Year Your Job Becomes a Startup