Pharos Production collaborated with a taxi aggregator platform to develop a high-load ride-hailing application that connects passengers and drivers in real time. This platform consolidates various fleets and independent drivers into a single system, ensuring quick ride matching, live tracking and transparent pricing. Built on a cloud-native infrastructure, the solution offers low-latency interactions, reliable trip processing and scalability for operations at the city and regional levels.
Reviewed by Dr. Dmytro Nasyrov, Founder and CTO
AI Services
- 90+ engineers
- 18 industries
- 13+ years in business
What is AI development?
Authoritative citations 5 sources
-
IDC
Worldwide enterprise AI spending forecast through 2027 by industry vertical
idc.com 2024
-
Stanford HAI AI Index
AI Index 2025 tracks training compute, model performance, investment and adoption metrics across the global AI industry
hai.stanford.edu 2025
-
McKinsey and Company
State of AI surveys global adoption, ROI realisation and risk-management practices across enterprise
mckinsey.com 2025
-
Gartner
Worldwide GenAI spending forecast tracks platform, services and infrastructure investment
gartner.com 2025
-
NIST
AI Risk Management Framework provides governance, mapping, measurement and management functions for trustworthy AI systems
nist.gov 2024
- AI features where deterministic rules engines would be cheaper and more reliable
- Demos or proofs of concept without a production deployment plan
- Projects expecting fixed quotes without a paid discovery sprint
- Use cases where data privacy requirements rule out cloud LLM APIs and budget cannot cover self-hosted GPU infrastructure
Custom AI vs off-the-shelf LLM SaaS: which is better?
Custom AI is purpose-built around your data, evaluation set and quality gates, while off-the-shelf LLM SaaS is a packaged tool with shared prompts and limited control. According to a 2024 a16z enterprise AI survey, 60% of enterprise AI buyers cite data privacy and accuracy as the top reasons to move from SaaS to custom builds. The right choice depends on data sensitivity, accuracy budget and how unique your workflows are.
| Factor | Custom AI build | Off-the-shelf LLM SaaS |
|---|---|---|
| Data control | Your data stays in your VPC or on-prem; full audit trail | Data sent to vendor; subject to vendor retention policy |
| Accuracy | Tuned on your eval set; measurable accuracy uplift over time | Generic accuracy; no eval loop tied to your domain |
| Latency control | Hosted close to your users; sub-200ms achievable | Bound by vendor regions; cold-start spikes possible |
| Cost at scale | Cost decreases with volume (own GPU or batch inference) | Per-token billing scales linearly; no volume discount cliff |
| Integration | Native integration with your data warehouse, ERP, CRM and observability stack | Webhooks/Zapier or vendor SDK; limited deep integration |
| Compliance | HIPAA, GDPR, SOC 2 controls baked in; documented data flows | Vendor BAA/DPA required; some workloads ineligible |
| Time to first value | 6-12 weeks for an MVP with a working evaluation harness | Days for a basic integration; weeks to harden it |
| Lock-in risk | Open weights, portable prompts, swap providers in days | Vendor lock-in on prompts, evals and pricing model |
Decision support visuals
Original diagrams Pharos Production uses during discovery to frame AI investment decisions. These are our own working artifacts, not reused marketing graphics. Cite them with attribution.
Custom AI vs SaaS decision flow
Three questions that pick the right architecture before you write code.
Cost crossover: SaaS vs custom AI total cost of ownership
Illustrative annual TCO curves at 2026 public cloud and GPU prices. Crossover position shifts with model choice, utilization and discounts.
RAG vs fine-tune decision matrix
Pick the right customization strategy by task narrowness and data freshness.
AI development at Pharos Production at a glance
- AI projects: 25+ production AI systems delivered since 2023 (RAG, agents, vision, NLP)
- Team: 90+ engineers, PhD-led AI practice, ML and MLOps specialists
- Pricing: AI MVP from $15,000-$40,000; production RAG/agent systems $40,000-$150,000+ - see our 2026 AI development cost research for hidden-layer breakdowns
- Timeline: Discovery 2-4 weeks; AI MVP 6-12 weeks; production with eval set and monitoring 4-9 months
- AI quality gates: Eval sets, shadow-mode validation, drift detection, prompt versioning, rollback procedures aligned with NIST AI RMF
- Compliance: Aligned with ISO 27001, SOC 2, GDPR and HIPAA frameworks for healthcare AI. EU AI Act and OWASP LLM Top 10 mapping on request
- Honest scope: We decline ~30% of AI RFPs when deterministic rules engines would solve the problem cheaper
How AI development evolved 2023-2026
The last three years reshaped AI engineering from research experiment to regulated production infrastructure. Prompt engineering on GPT-3.5 became cost-sensitive RAG, then agent orchestration with evaluation harnesses, and now a measurable cost crossover where custom AI beats generic SaaS on recurring enterprise workloads. Each shift brought new guardrails, new regulation and a new generation of models. The milestones below are the ones that changed how we scope, price and ship AI at Pharos Production.
-
LLM foundation
ChatGPT moves AI from lab to product. RAG and parameter-efficient fine-tuning become the default enterprise patterns.
- OpenAI GPT-4 (Mar 2023) makes multi-step reasoning production-viable.
- Meta LLaMA 2 (Jul 2023) opens the door to self-hosted enterprise LLMs.
- QLoRA (May 2023) cuts fine-tuning memory cost by 3-4x and makes domain adaptation affordable.
- US NIST AI RMF 1.0 (Jan 2023) and EO 14110 (Oct 2023) put responsible AI on every enterprise checklist.
-
Agentic and multimodal
Context windows pass 1M tokens, agents gain tool-use reliability and regulation enters enforcement.
- Anthropic Claude 3.5 Sonnet (Jun 2024) and OpenAI GPT-4o (May 2024) drop API prices 40-60% vs prior generations.
- Google Gemini 1.5 Pro ships a 1M token context window, enabling whole-codebase and long-document reasoning.
- The EU AI Act (enacted Aug 2024) kicks off phased obligations for high-risk AI, GPAI transparency and prohibited practices.
- Open agent frameworks (LangGraph, CrewAI, AutoGen) and OpenAI Realtime API (Oct 2024) make voice and multi-step tool use production-ready.
-
Enterprise and governance
Reasoning models, open-weight parity and mature MLOps push AI from proof of concept to audited production.
- Anthropic Claude 3.7 Sonnet ships extended thinking; OpenAI o3 and o4-mini formalize reasoning-model tiers.
- DeepSeek R1 (Jan 2025) puts open-weight reasoning within 10-15% of closed frontier models at a fraction of the cost.
- Agentic coding platforms (Claude Code, Cursor, Copilot Workspace) move from autocomplete to multi-file refactors and test generation.
- Epoch AI measures 3-10x annual drops in inference cost for equal-quality models, compounding the 2023-2024 declines.
-
Cost crossover and custom shift
Custom AI beats SaaS on recurring enterprise workloads. Agents gain a shared tool protocol. Sovereign AI reshapes deployment.
- Custom AI unit economics cross under off-the-shelf SaaS on repeat-workflow use cases; payback windows fall to 4-6 months on mid-volume deployments.
- Model Context Protocol standardizes tool and resource discovery across agents, vendors and IDEs.
- Small Language Models (1-8B params) run on-prem and at the edge for regulated data, long context and sub-100ms latency budgets.
- Sovereign AI frameworks in the EU, India, UAE, Singapore and Saudi Arabia push more workloads to region-scoped or self-hosted inference.
Selected AI projects from data-heavy clients
Our AI practice ships production systems, not demos. PhD-led research direction, a dedicated MLOps team and 25+ AI systems delivered since 2023 across enterprise search, agent orchestration, fraud detection and clinical decision support. We work the full stack: model selection (open-source vs API), retrieval pipelines (RAG, hybrid search, reranking), fine-tuning when warranted, evaluation sets gated against the NIST AI RMF and drift detection in production. We do not paste OpenAI keys onto static templates and call it AI. Every project ships with an offline eval suite, shadow-mode rollout, hallucination guardrails and an MLOps loop for retraining cadence. We routinely advise clients to NOT use AI when a deterministic rules engine wins on cost and latency, and we say so before quoting. Below are selected AI projects from FinTech, healthcare and data-heavy clients.
-
-
Pharos Production has partnered with Sagas to create a location-aware social platform that enables users to capture, publish, and explore geo-located timelapses over time. This system combines real-time data ingestion, large-scale media processing, and map-centric discovery to transform physical locations into dynamic digital stories. Leveraging cloud-native infrastructure and event-driven architecture, Sagas allows users to document urban changes, natural evolution, and personal moments tied to specific places. The result is a scalable social network where time and location are central to content discovery.
-
Pharos Production has partnered with Pulse to create a community-driven social network that connects users with local stores through challenges, engagement activities, and real-world prizes. This platform transforms everyday local interactions into interactive experiences, enabling users to earn rewards from participating merchants. Built on a scalable, event-driven architecture, Pulse facilitates real-time interactions between users and businesses and supports rapid growth across cities and regions.
-
Pharos Production has partnered with Pleenk to build a secure, scalable payments platform for fast transactions, fraud prevention and seamless integration with digital products. The platform processes payment flows in real time while maintaining high levels of security, transparency and reliability for both businesses and end users. Built on cloud-native infrastructure and an event-driven architecture, Pleenk provides a strong foundation for modern digital payments.
-
Pharos Production partnered with Nextcheck to replace outdated, manual onboarding with a secure, automated KYC/AML platform. Built on AWS, Kubernetes, Istio, Elixir, RabbitMQ, PostgreSQL and NextJS, the platform provides real-time biometric and document verification, risk assessment and compliance reporting. Since 2019, Nextcheck has reduced onboarding time by 60%, cut manual labor by 70% and expanded to support thousands of checks at once. Today, it powers global banks, fintechs and crypto firms with a cloud-native, regulation-ready, growth-oriented compliance platform.
-
Pharos Production partnered with a healthcare organization to design and build MedCore, a comprehensive electronic health record platform that centralizes patient data, streamlines clinical workflows and ensures regulatory compliance. The system unifies medical records, clinical documentation, diagnostics and administrative processes within a secure, scalable digital environment. Built on a cloud-native architecture, MedCore delivers reliable performance, real-time data access and long-term scalability for healthcare providers operating at clinic, hospital and network levels.
-
Pharos Production has partnered with Kimlic to develop a blockchain-based Know Your Customer (KYC) and digital identity platform. This platform ensures that user verification is secure, reusable and privacy-preserving across Web3 and fintech ecosystems. Users can verify their identity once and then securely share proof with multiple services without exposing sensitive personal information. Built on cloud-native infrastructure and equipped with real-time data pipelines, Kimlic provides compliant identity verification at scale while allowing users to retain control over their data.
-
Pharos Production partnered with Dostyq to create a modern loyalty and rewards platform that helps users collect, manage and exchange bonuses, gift certificates and cashback in one place. The app makes reward usage easier by enabling instant and secure transfers and redemptions. Since 2018, Dostyq has become a trusted shopping partner in Kazakhstan, increasing customer engagement and helping retailers strengthen loyalty programs on a large scale.
About Founder and CTO
Founder and CTO Pharos Production
I design and build reliable software solutions – from lightweight apps to high-load distributed systems and blockchain platforms.
PhD in Artificial Intelligence, MSc in Computer Science (with honors), MSc in Electronics & Precision Mechanics.
-
13 years in architecture of great software solutions tailored to customer needs for startups and enterprises
-
23 years of practical enterprise customized software production experience
-
Lecturer at the National Kyiv Polytechnic University
-
Doctor of Philosophy in Artificial Intelligence
-
Master’s degree in Computer Science, completed with excellence
-
Master’s degree in Electronics and precision mechanics engineering
Pharos AI Eval Loop
The Pharos AI Eval Loop is our four-step delivery cycle for production AI: Scope, Build, Eval and Hardening.
-
1
Scope
1-2 weeksmaps the use case to an evaluation set drawn from real client data, defines disallowed behaviors and answers the question "can this be solved without AI?" before any code is written
Artifacts:- evaluation set v1
- scope memo
- kill-switch criteria
-
2
Build
4-8 weeksships the smallest model and retrieval architecture that beats the baseline on the eval set, with prompt versioning under git and reproducible inference
Artifacts:- model card
- prompt registry
- RAG ingestion pipeline
-
3
Eval
concurrent with Build, then 2-4 weeks gatedruns shadow-mode comparison against human baselines or rules-engine baselines on live traffic with no user impact, until accuracy, fairness and latency thresholds are met
Artifacts:- shadow-mode report
- accuracy delta
- latency histogram
- fairness audit aligned with the <a href="https://www
-
4
Hardening
2-4 weeksinstalls drift detection, output guardrails, audit logging and a documented rollback plan before any production cutover
Artifacts:- drift dashboard
- alerting runbook
- rollback playbook
- MLOps retraining cadence
The loop is named because production AI is never one-shot delivery - we re-enter Eval and Hardening on every prompt change, model upgrade or data shift across the engagement lifetime.
-
Phase 01 / 04 Paid Discovery
2-4 weeks- Technical validation
- Architecture proposal
- Scope refined estimate
-
Phase 02 / 04 Iterative Build
2-week sprints- Working demos every sprint
- CTO review at milestones
- ADRs documented
-
Phase 03 / 04 Production Readiness
- Monitoring and alerting
- Security audit Pen test
- Runbooks and rollback
-
Phase 04 / 04 Support
Ongoing- Security patches
- Performance tuning
- 4h SLA response
Pharos Verified Delivery applied to 70+ production applications since 2013
Real client transformations
Anonymized before/after snapshots from production projects. Metrics measured against client-reported pre-engagement baselines.
12 full-time agents handling 8,000 tickets per week. Average response time 4.2 hours. Tier-1 questions consumed 70% of agent capacity.
Custom AI agent deflects 62% of tier-1 tickets with 91% customer satisfaction. Agents now focus on complex cases. Response time on remaining tickets dropped to 28 minutes.
We started with a 200-question evaluation set built from real ticket history, ran the agent in shadow-mode for 3 weeks against human responses and only routed live traffic once accuracy beat the human baseline on tier-1 categories.
Junior attorneys spent 6-8 hours per case reviewing precedent documents. Inconsistent citations across the team.
RAG system over 50,000 case documents with 3-second response time. Citation precision 94% verified against ground truth. Junior attorney research time cut by 75%.
Built on a private vector store with citation tracking back to source paragraphs. Every answer ships with a verifiable footnote so partners can audit any response in under 30 seconds.
Manual orchestration of 6 internal tools for finance ops. 14-day month-end close. Three full-time analysts.
Multi-agent system with finance specialist, data extractor, validator and reporter. Month-end close in 3 days with full audit trail. Analysts redeployed to higher-value forecasting work.
Each agent has a narrow tool surface and a structured handoff protocol. Every action is logged with the full prompt, intermediate state and final tool call, so finance can replay and audit any close-cycle step on demand.
Client names anonymized under NDA. Full case studies at /cases/.
When AI is not the answer
We decline roughly 30% of RFPs we receive. Forcing a bad fit costs both sides 3-6 months and damages outcomes. Here is how we think about scope:
- Problems where business rules are deterministic - rules engines are 100x cheaper and fully auditable
- Use cases requiring zero-error guarantees on individual predictions (medical dosing, financial settlement)
- Sub-100ms latency budgets that LLM inference cannot meet
- Projects with no plan for ongoing prompt maintenance, drift monitoring or model versioning
- Data residency requirements that prohibit cloud LLM APIs without budget for self-hosted GPUs
Every AI engagement begins with "can this be solved without AI?" If yes, we say so and recommend the cheaper path. We have lost 15-20% of potential AI projects by being honest about scope - and won 3x more on the projects we did take.
Read before you commit
Original research based on 25+ Pharos AI projects: cost ranges by complexity tier, hidden costs analysis, ROI timelines and team composition.
Reviews
Independent reviews from Clutch, GoodFirms and Google - verified client feedback on our software projects
Based on 9 verified client reviews
Platforms We Work With
Trusted by Coinbase, Consensys, Core Scientific, MicroStrategy, Gate.io and 10+ more Web3 and enterprise platforms
16+ partnersOur 16 technology partners include:
- Consensys
- Gate Io
- Coinbase
- Ludo
- Core Scientific
- Debut Infotech
- Axoni
- Alchemy
- Starkware
- Mara Holdings
- Microstrategy
- Nubank
- Okx
- Uniswap
- Riot
- Leeway Hertz
-
Consensys
-
Gate Io
-
Coinbase
-
Ludo
-
Core Scientific
-
Debut Infotech
-
Axoni
-
Alchemy
-
Starkware
-
Mara Holdings
-
Microstrategy
-
Nubank
-
Okx
-
Uniswap
-
Riot
-
Leeway Hertz
Partnerships & Awards
Recognized on Clutch, GoodFirms and The Manifest for software engineering excellence
Reviewed by Dmytro Nasyrov
Founder and CTO
23+ years in custom software development. Led 70+ projects across FinTech, healthcare, Web3 and enterprise. aligned with ISO 27001 team.
Choose your cooperation model
Core software architecture, initial UI/UX, working prototype in 3 months
Software architecture, UI/UX, customized software development, manual and automated testing, cloud deployment
Comprehensive software architecture and documentation, UI/UX design layouts, UI kit, clickable prototypes, cloud deployment, continuous integration, as well as automated monitoring and notifications.
Prices vary based on project scope, complexity, timeline and requirements. Contact us for a personalized estimate.
Or select the appropriate interaction model
Request staff augmentation
Need extra hands on your software project? Our developers can jump in at any stage – from architecture to auditing – and integrate seamlessly with your team to fill any technical gaps.
Hire dedicated experts
Whether you’re building from scratch or scaling fast, our engineers are ready to step in. You stay in control, and we handle the code.
Outsource your project
From first line to final audit, we handle the entire development process. We will deliver secure, production-ready software, while you can focus on your business.
| Model | Best for | Team setup | Budget range |
|---|---|---|---|
| Staff Augmentation | Existing teams needing extra engineers at any project stage | 1-2 weeks | From $5,000/month |
| Dedicated Team Popular | Long-term projects requiring full ownership and control | 2-4 weeks | From $15,000/month |
| Project Outsourcing | Full-cycle development from idea to production launch | 1-2 weeks | $10,000-$80,000+ |
Technologies, tools and frameworks we use
Our engineers work with 187+ technologies across blockchain, backend, frontend, mobile and DevOps - chosen for production reliability and performance.
AI and Machine Learning
LLM Providers 8
AI Frameworks 15
Vector Databases 7
MLOps and Infrastructure 11
AI Agent Tools 4
Blockchains
Private and Public Blockchains 33
Cloud Blockchain Solutions 4
DevOps
DevOps Tools 15
Clouds
Clouds 6
Databases
Databases 15
Brokers
Event and Message Brokers 7
Tests
Test Automation Tools 6
UI/UX
UI/UX Design Tools 12
An approach to the development cycle
-
Team Assembly
Our company starts and assembles an entire project specialists with the perfect blend of skills and experience to start the work.
-
MVP
We’ll design, build, and launch your MVP, ensuring it meets the core requirements of your software solution.
-
Production
We’ll create a complete software solution that is custom-made to meet your exact specifications.
-
Ongoing
Continuous Support
Our company will be right there with you, keeping your software solution running smoothly, fixing issues, and rolling out updates.
Frequently asked questions about AI development
Type to filter questions and answers. Use Topic to narrow the list.
Showing all 10
No matches
Try a different keyword, change the topic, or clear filters
-
A production-ready RAG system typically takes 8-12 weeks: 2 weeks discovery and evaluation set creation, 4-6 weeks build (ingestion pipeline, embeddings, retrieval, generation, eval harness), 2-4 weeks production hardening (drift detection, monitoring, rollback). Pharos uses a shadow-mode evaluation phase where the RAG system runs alongside human baselines before going live.
-
Agent costs depend on complexity. A single-purpose AI agent with 2-4 tools and one LLM provider costs $25,000-$60,000 for an MVP. A multi-agent orchestration system with 6-10 specialized agents, structured handoffs and full audit logging runs $80,000-$200,000+. Per-token pricing from OpenAI and Anthropic has fallen 80-90% since 2023, but total bills have risen because agent-loop depth, retrieval size and context windows expanded faster than unit prices fell. The biggest cost driver is not the LLM bill, it is the evaluation set, guardrails and observability you need to safely run agents in production - priced layer by layer in our 2026 AI development cost research.
-
Start with prompt engineering. Move to RAG when you need the model to use your private data or when answers must be grounded in citations. Fine-tune only when (a) the task is narrow and high-volume, (b) you have 1,000+ labeled examples, (c) prompt and RAG approaches plateau on your eval set. Parameter-efficient fine-tuning with LoRA and QLoRA cuts trainable parameter count by two to three orders of magnitude, which makes the training step affordable but does not eliminate the serving, evaluation and on-call costs. In practice, 80% of Pharos AI projects ship without fine-tuning. Fine-tuning makes sense for domain-specific tone, structured output reliability and inference cost reduction at scale.
-
Hallucinations are mitigated through layered controls: grounded retrieval (RAG with citation tracking), structured output schemas with validation, confidence thresholds with human handoff, an evaluation set tested on every deploy and runtime guardrails that flag low-confidence answers. We also instrument every response so you can audit any answer back to its source documents.
-
Use cloud LLM APIs (OpenAI, Anthropic, Vertex) when latency is not extreme, data residency rules allow it and your usage is below ~1B tokens/month. Self-host open-source models (Llama, Mistral, Qwen) when you have hard data residency requirements, need sub-200ms latency on long context or your monthly token spend would justify GPU infrastructure. Epoch AI inference cost trends show the crossover point moves every 6 to 12 months as hosted-API prices fall and open-weight models get more efficient. We help model the cost crossover point during discovery.
-
Every Pharos AI project includes: a documented use case with intended and disallowed behaviors, an evaluation set covering accuracy, fairness and safety, content filtering for harmful outputs, audit logging of every prompt and response, drift monitoring with alerts, and a rollback plan. Controls map to NIST AI RMF, EU AI Act risk categories and the OWASP Top 10 for LLM Applications. For regulated industries we add bias testing, explainability layers and human-in-the-loop checkpoints on consequential decisions.
-
We baseline before/after metrics during discovery. For customer support automation: ticket deflection rate, CSAT, agent capacity freed.
For document Q&A: research time per task, citation precision. For multi-agent ops: cycle time, error rate, headcount redeployed. Pharos requires a measurable business metric in every AI engagement - if we cannot define it, we will not start the project.
-
Yes. Pharos AI engineers integrate with existing data warehouses (Snowflake, BigQuery, Redshift), feature stores (Feast, Tecton), MLOps platforms (Vertex, SageMaker, Databricks) and observability (Arize, WhyLabs, Datadog).
We avoid creating parallel infrastructure and prefer to add AI capabilities to your existing data plane.
-
We decline roughly 30% of AI RFPs. Common reasons: business rules are deterministic and a rules engine is 100x cheaper; the use case requires zero-error guarantees on individual predictions; sub-100ms latency budgets that LLM inference cannot meet; no plan for ongoing prompt maintenance or drift monitoring; data residency rules out cloud LLM APIs without budget for self-hosted GPUs.
-
Frameworks: LangChain, LlamaIndex, Haystack, DSPy. Model providers: OpenAI, Anthropic Claude, Google Vertex, AWS Bedrock, self-hosted Llama and Mistral.
ML toolkits: PyTorch, TensorFlow, Hugging Face Transformers. Vector stores: Pinecone, Weaviate, pgvector, Qdrant. The right stack depends on your latency budget, data residency rules and existing infrastructure.
Sources and references
External authorities, standards bodies and primary documentation referenced throughout this AI guide.
- NIST AI Risk Management Framework nist.gov
- OpenAI Documentation openai.com
- Anthropic Documentation anthropic.com
- Hugging Face Hub huggingface.co
- LangChain Documentation langchain.com
- OWASP LLM Top 10 owasp.org
- arXiv arxiv.org
- a16z AI Canon a16z.com
Published record
Published Pharos research
Technical articles, comparison guides and methodology deep-dives we write from our own delivery experience.
Your business results matter
Achieve them with minimized risk through our bespoke innovation capabilities
What happens next?
-
Contact us
Contact us today to discuss your project. We’re ready to review your request promptly and guide you on the best next steps for collaboration
Same day -
NDA
We’re committed to keeping your information confidential, so we’ll sign a Non-Disclosure Agreement
1 day -
Plan the Goals
After we chat about your goals and needs, we’ll craft a comprehensive proposal detailing the project scope, team, timeline and budget
3-5 days -
Finalize the Details
Let’s connect on Google Meet to go through the proposal and confirm all the details together!
1-2 days -
Sign the Contract
As soon as the contract is signed, our dedicated team will jump into action on your project!
Same day
Our offices
Headquarters in Las Vegas, Nevada. Engineering office in Kyiv, Ukraine.