With Nicolas Bustamante, CEO & Co-founder

Fintool is an AI equity research assistant that helps investors make better decisions by processing large volumes of unstructured financial data, from SEC filings to earnings call transcripts. They serve leading institutional investors such as Kennedy Capital and First Manhattan, as well as companies like PricewaterhouseCoopers.
For institutional investors, trust is paramount, and a single overlooked disclosure can have serious consequences. However, the sheer volume of daily regulatory filings makes it impossible for humans to review every document. Fintool addressed this problem by developing Fintool Feed, a Twitter-like interface where they summarize key sections of documents based on user prompts. Investors select the companies they want to monitor and configure alerts by specifying what type of information they want to be summarized.
However, the team soon realized the need for real-time monitoring to maintain quality and user confidence. They faced a few key challenges:
In this case study, we'll share how Fintool used Braintrust to develop a repeatable evaluation workflow that scales to massive amounts of data while maintaining trust in high-stakes financial contexts.

Fintool makes sure every insight includes a reliable source, like an SEC document ID, and automatically flags anything that’s missing or doesn’t look right. This is a big deal in finance, where trust comes down to having data you can verify.
They don’t just check that sources are included. They also make sure they’re valid, properly formatted, and tied directly to the insights. The team set up custom rules in Braintrust, like requiring SEC IDs and double-checking quoted text, and real-time monitoring catches anything that doesn’t meet the standards.

Fintool also uses span iframes to show citations within trace spans, so expert reviewers can quickly validate the content.

Fintool leverages Braintrust’s tools to benchmark the quality of LLM outputs in real time. The engineering team crafts golden datasets tailored to specific industries and document types, like healthcare compliance or tech KPIs.
The golden datasets are built by combining production logs with handpicked examples that reflect real-world scenarios, which helps the datasets stay fresh as Fintool processes over 1.5 billion tokens across 70 million data chunks daily.
Each generated insight is evaluated using LLM-as-a-judge scorers on key metrics like accuracy, relevance, and completeness. Braintrust automatically updates whenever Fintool adjusts prompts or ingests new data, preventing surprise regressions and saving valuable engineering resources.
FORMAT_PROMPT = """You are a format validator. Check if the following text follows this format:
1. A short business description paragraph
2. Followed by a markdown numbered list of product lines, where each bullet point:
- Starts with the product name
- Contains a short description of the product line
Text to validate:
<text>
{output}
</text>
Respond with:
"PASS" if it follows the format perfectly
"FAIL" if it deviates from the format"""
format_quality = LLMClassifier(
name="Format Check",
prompt_template=FORMAT_PROMPT,
choice_scores={"PASS": 1, "FAIL": 0},
)
Using automated scoring functions frees up bandwidth for human reviewers to focus on the toughest cases.
When content gets a low score or is downvoted, a human expert is immediately notified to step in. They can approve, reject, or edit the Markdown to fix issues like poor formatting. Since the Fintool database is linked directly to Braintrust, the expert can update the live content right from the Braintrust UI.

This quick response means that any problems are addressed and improved as soon as possible.
This evaluation workflow has helped Fintool manage millions of LLM-generated insights, improving accuracy, consistency, and efficiency at scale. By streamlining their eval process, Fintool is able to make sure their financial summaries and alerts meet the highest standards of trust and reliability. Key successes include:
Fintool has set a new standard for financial AI, delivering timely and actionable insights with accuracy and efficiency.
Fintool processes 1.5 billion tokens daily while maintaining rigorous citation standards. Learn how Braintrust enables automated quality checks, human-in-the-loop oversight, and real-time monitoring for high-stakes AI applications.
“Loop was our way of getting data or synthesizing log data more efficiently at an aggregate level. We use it to find common error patterns every single week.”