Background Jobs & Queues (Redis + BullMQ)¶
All work in the Mr. Mentor backend that must not block an HTTP request — sending email, AI resume screening, recomputing dashboard KPIs, syncing external LMS/exam platforms, driving the sales-automation workflow engine, reconciling AI phone calls — is pushed onto a BullMQ queue backed by a single Redis instance and consumed by one of ~21 long-lived Workers started at boot. This document is the canonical map of every queue, every worker, the producer→queue→worker→side-effect flow, the scheduling/cron timeline, the job lifecycle, and the Bull Board monitoring UI.
Status: documented from source on this branch.
Overview¶
- What it does. Offloads slow, retryable, or scheduled work from the request path onto Redis-backed job queues. Producers (controllers, services, socket handlers, other workers) enqueue jobs; workers process them asynchronously with their own concurrency and rate limits.
- Who uses it. Indirectly everyone — every product surface in the suite relies on at least one queue (Mr. Mentor meetings → email + cleanup, Mr. Hire → resumeAnalysis, LMS → daily-cards/badges/warnings/risk, Sales CRM → workflow/leadAssignment/whatsapp/aaryaSync, Finance → email invoices). Operators (ADMIN / SUPERADMIN) observe and replay jobs through Bull Board.
- Where it sits. A cross-cutting infrastructure layer. The two singletons are
src/services/QueueService.ts(the producer side — owns allQueueobjects andadd*/schedule*helpers) andsrc/config/redis.ts(RedisService, a general-purpose Redis client also used for caching). Workers live insrc/workers/*and are dynamically imported during the boot sequence insrc/index.ts.
Key facts to keep in mind throughout:
- One Redis, one logical DB. Every queue and the cache share
REDIS_HOST/REDIS_PORT(defaultslocalhost:6379). Password / DB-index lines are commented out in bothQueueService.tsandredis.ts, so auth is currently off. - No BullMQ-level retries are configured. No queue sets
attemptsorbackoffin itsdefaultJobOptions, so the effectiveattemptsis1— a thrown error moves the job straight to failed (see Edge cases). Some workers implement their own in-handler retry loops. - Repeatable jobs are re-registered on every boot with stable
jobIds, so a config change or a Redis flush is always reconciled at startup (src/index.ts).
Key concepts & entities¶
This domain owns infrastructure, not TypeORM entities — its "data" is Redis keys (BullMQ job hashes, wait/active/completed/failed lists, repeatable-job schedules). It does, however, read and write many business entities through the services each worker calls.
| Concept | Meaning |
|---|---|
| Queue | A named BullMQ list in Redis (e.g. emailQueue). Created in QueueService's constructor. |
| Job | A unit of work added with queue.add(name, data, options). Carries a name, a data payload, and options. |
| Worker | A long-lived consumer bound to one queue name (new Worker('emailQueue', processor, opts)). One file per worker in src/workers/. |
| Repeatable job | A job with repeat: { every } or repeat: { pattern } (cron). BullMQ re-enqueues it on schedule. Registered via schedule*() helpers. |
| Stable jobId | A fixed jobId (e.g. badge-evaluation-daily, mrlearn-sync-<configId>) so re-registering replaces the prior schedule instead of stacking duplicates. |
defaultJobOptions |
Per-queue defaults — here only removeOnComplete / removeOnFail (how many finished jobs to retain for inspection). |
concurrency |
How many jobs a worker processes in parallel. Most are 1; high-throughput queues use 3-5. |
limiter |
Rate cap. Only emailQueue sets one (max: 10 per 1000 ms). |
Relevant source files:
src/services/QueueService.ts— singleton producer + all queue definitions + all schedulers (1144 lines).src/config/redis.ts—RedisServicesingleton (ioredis),lazyConnect, reconnect-on-READONLY.src/workers/*.worker.ts— 21 worker files.src/routes/bullBoard.routes.ts— Bull Board dashboard, basic-auth protected.src/types/EmailQueue.types.ts,src/types/DatabaseQueue.types.ts,src/types/ResumeQueue.types.ts— job payload contracts.
Architecture¶
flowchart TD
subgraph PROD["Producers"]
CTRL["Controllers / Services / Socket handlers"]
SCHED["Boot schedulers (src/index.ts)"]
WRK2WRK["Worker-to-worker enqueues"]
end
QS["QueueService singleton (src/services/QueueService.ts)"]
REDIS[("Redis (BullMQ + cache)")]
subgraph WORKERS["Workers (src/workers)"]
EMAIL["email.worker (x5)"]
DB["database.worker (x3)"]
RESUME["resumeAnalysis.worker (x5)"]
WF["workflow.worker (x5)"]
OTHERS["cleanup / kpi / warning / daily-cards / badges / risk / sync workers (x1)"]
end
subgraph SIDE["Side effects"]
SMTP["Nodemailer / Gmail SMTP"]
PG[("PostgreSQL via TypeORM")]
S3["AWS S3"]
EXT["ElevenLabs / LiveKit / Graphy / EzExam / WhatsApp / Exotel"]
CACHE["Redis cache keys (KPIs, stats)"]
end
CTRL -->|"addEmailJob / addResumeAnalysisJob / addWorkflowStepJob"| QS
SCHED -->|"schedule* repeatable jobs"| QS
WRK2WRK --> QS
QS -->|"queue.add"| REDIS
REDIS -->|"reserve job"| EMAIL
REDIS --> DB
REDIS --> RESUME
REDIS --> WF
REDIS --> OTHERS
EMAIL --> SMTP
DB --> PG
RESUME --> PG
RESUME --> S3
RESUME --> EXT
WF --> PG
WF --> EXT
OTHERS --> PG
OTHERS --> CACHE
OTHERS --> EXT
Each worker opens its own Redis connection inline (it does not reuse RedisService). The producer side is a
single QueueService instance obtained via QueueService.getInstance().
Data model¶
There are no SQL tables for this domain. The "schema" is the set of Redis structures BullMQ maintains per queue,
plus the job payload shapes declared in src/types/*Queue.types.ts.
erDiagram
QUEUE ||--o{ JOB : "holds"
QUEUE ||--o{ REPEATABLE : "schedules"
JOB ||--|| PAYLOAD : "carries"
WORKER }o--|| QUEUE : "consumes"
QUEUE {
string name PK
int removeOnComplete
int removeOnFail
}
JOB {
string id PK
string name
string state
int attemptsMade
json data
}
REPEATABLE {
string jobId PK
string everyOrPattern
string tz
}
WORKER {
string queueName PK
int concurrency
json limiter
}
PAYLOAD {
string type
json fields
}
Representative payload contracts:
- Email (
EmailJobData,src/types/EmailQueue.types.ts): a discriminated union overtype(meeting-approval,payment-invoice,crm-lead-email,workflow-email,welcome-email, ~30 variants). - Database (
DatabaseJobData,src/types/DatabaseQueue.types.ts):create-token,update-mentor-rating,updateMentorMultiplier,deleteUser,backfill-applications. - Resume (
ResumeAnalysisJobData,src/types/ResumeQueue.types.ts):typeofresume-analysis/full-screening/ai-callplusapplicationId,resumeUrl/resumeKey,jobPostId,authToken.
API surface¶
The only HTTP surface for this domain is the Bull Board dashboard. It is constructed in
src/routes/bullBoard.routes.ts and mounted in src/routes/index.ts (this.router.use('/', this.bullBoardRoutes.router)),
which is itself mounted at / by src/app.ts. Inside BullBoardRoutes the express adapter base path is
/admin/queues and the router guards it with HTTP Basic auth.
| Method | Path | Auth/role | Purpose |
|---|---|---|---|
| GET | /admin/queues |
HTTP Basic (BULL_BOARD_USERNAME / BULL_BOARD_PASSWORD (credentials set via env vars)) |
Bull Board UI — list queues, inspect jobs |
| GET/POST | /admin/queues/api/* |
HTTP Basic | Bull Board internal API (job lists, retry, promote, clean) used by the UI |
Notes:
- Bull Board registers 19 of the 22 queues (see
QueueService.getQueues()).warningQueue,badgeEvaluationQueue, andstudentRiskComputationQueueare not in the Bull Board list and so are not visible in the dashboard, even though their workers run. - This is not under
/apiand does not use the JWTauthMiddleware; it has its own Basic-auth gate. - There is no JSON REST API to enqueue jobs directly — production happens in-process via
QueueServicemethods called from controllers/services. Several services also expose admin endpoints that internally calltriggerMrLearnSyncNow,addWorkflowScanJob,addAaryaSyncJob, etc. (those endpoints belong to their feature docs).
Workers & queues — the master table¶
Concurrency / retention values are read from each worker file and QueueService defaultJobOptions.
"Trigger/Schedule" lists how jobs reach the queue.
Worker (src/workers/) |
Queue | Trigger / Schedule | Purpose |
|---|---|---|---|
email.worker (conc 5, limiter 10/sec) |
emailQueue |
On demand via addEmailJob |
Transactional email for ~30 types — meeting approve/confirm/expire, onboarding (mentor/admin/sales/student/HR/CM/super-mentor), password-changed, role-changed, payment invoice, quiz invite, discount approved, batch-meeting, CRM lead email, telecaller assignment. Sends via Nodemailer/Gmail. |
whatsapp.worker (conc 5) |
whatsappQueue |
On demand via addWhatsAppJob |
WhatsApp sends. Currently handles APPLICATION_MEETING_REMINDER through ApplicationMeetingReminderService. |
database.worker (conc 3) |
databaseQueue |
On demand + cron 0 2 * * * IST |
Heavy DB ops off the request path: create-token, update-mentor-rating, updateMentorMultiplier, deleteUser, and the scheduled backfill-applications job. |
cleanup.worker (conc 1) |
cleanupQueue |
Repeatable every 15m (scheduleSlotCleanup) |
Expire stale/unbooked slots via SlotCleanupService. (Doc comment says 24h; code is 15 min.) |
kpi.worker (conc 1) |
kpiQueue |
Repeatable every 15m (two jobs) |
Recompute + cache dashboard KPIs (dashboardKpiCalculation) and sales overview stats (salesOverviewStatsCalculation) into Redis via AdminDashboardService. |
resumeAnalysis.worker (conc 5) |
resumeAnalysisQueue |
On demand via addResumeAnalysisJob |
Mr. Hire screening pipeline: resume-analysis / full-screening (LLM scoring), ai-call (dispatch AI phone interview), ai-call-check (poll result). Re-enqueues follow-up jobs onto itself. |
salaryBenchmark.worker (conc 1) |
salaryBenchmarkQueue |
Repeatable every 15 days |
Refresh salary benchmark data via SalaryBenchmarkService. |
warning.worker (conc 1) |
warningQueue |
Cron 59 23 * * * IST + on demand |
Process the day's attendance absences and create student warnings. |
leadAssignment.worker (conc 1) |
leadAssignmentQueue |
Repeatable every 15m scheduled — but worker import is commented out in index.ts |
Round-robin auto-assign unassigned CRM leads via SalesDashboardService. Schedule is registered yet no consumer runs, so jobs accumulate (inferred). |
dailyCards (daily-cards.worker, conc 1) |
dailyCardsQueue |
Cron 0 0 * * * IST |
Generate per-student daily engagement/gamification cards (Redis-backed). |
course-plan.worker (conc 1) |
coursePlanQueue |
On demand | Generate course-plan / lecture content "one lecture at a time". |
missOzone.worker (conc 4) |
missOzoneQueue |
Repeatable every 60s (poll-active-calls) + cron 30 3 * * * (prune-old-calls) + on demand |
AI voice-mentor (Miss Ozone) call reconciliation against LiveKit, evaluation fetch with in-handler retries, and daily retention prune of transcript/eval blobs. |
mrlearnSync.worker (conc 1) |
mrlearnSyncQueue |
Per-config repeatable every intervalHours + manual |
Sync learners/progress from Graphy (Mr. Learn) per MrLearnSyncConfig. Supports fast incremental mode. |
mrlearnReminder.worker (conc 1) |
mrlearnReminderQueue |
Per-config repeatable every reminderIntervalHours + manual |
Send WhatsApp course-progress nudges to learners below a progress threshold. |
mrlearnNewStudentSync.worker (conc 1) |
mrlearnNewStudentSyncQueue |
Global repeatable, SystemConfig-driven interval |
Fan out a fast new-enrolment sync across all enabled MrLearnSyncConfigs. |
mrtestSync.worker (conc 1) |
mrtestSyncQueue |
Per-config repeatable every intervalHours + manual |
Sync exam/test data from EzExam (Mr. Test) per MrTestSyncConfig. |
badgeEvaluation.worker (conc 1) |
badgeEvaluationQueue |
Repeatable every 24h |
Safety-net cron re-running BadgeService.evaluateForUser for enrolled students (catches badges like profile_pro whose events skip grantXp). |
studentRiskComputation.worker (conc 1) |
studentRiskComputationQueue |
Repeatable every 24h |
Walk active batches and write fresh student_risk_scores rows via StudentRiskService (BL "Students at Risk" card). |
assignmentReminder.worker (conc 1) |
assignmentReminderQueue |
Cron 0 19 * * * IST |
Notify students of assignments due in the next 24h (assignment_due_24h template, subject to a 3/day cap). |
aaryaSync.worker (conc 1) |
aaryaSyncQueue |
Repeatable every 15m + on demand |
Poll ElevenLabs for dispatched Aarya calls; write back duration + conversation_id + predicted lead interest (AaryaCallSyncService). |
workflow.worker (conc 5) |
workflowQueue |
Repeatable scan every 5m + event/step/wake enqueues |
Sales-automation engine: workflowScan (enroll matching leads), workflowStep (advance one enrollment by one node, incl. delayed Wait steps), workflowEvent (react to lead lifecycle events). |
| (no dedicated worker) | notificationQueue |
addNotificationJob |
Declared and exposed in Bull Board, but no notification.worker consumes it on this branch (inferred — notifications are largely handled inline / via email + whatsapp queues). |
Producer→queue→worker→side-effects, for the busiest queues:
flowchart LR
subgraph P["Producers"]
MC["Meeting / Auth / Payment controllers"]
HC["Mr. Hire job-application flow"]
SC["Sales CRM lead events"]
CR["Boot cron schedulers"]
end
MC -->|"addEmailJob"| EQ["emailQueue"]
MC -->|"addDatabaseJob"| DQ["databaseQueue"]
HC -->|"addResumeAnalysisJob"| RQ["resumeAnalysisQueue"]
SC -->|"addWorkflowEventJob / addWorkflowStepJob"| WQ["workflowQueue"]
CR -->|"every 15m"| KQ["kpiQueue"]
CR -->|"every 15m"| CQ["cleanupQueue"]
EQ --> EW["email.worker"] --> SMTP["Gmail SMTP"]
DQ --> DW["database.worker"] --> PG[("PostgreSQL")]
RQ --> RW["resumeAnalysis.worker"] --> LLM["LLM + AI call + S3"]
WQ --> WW["workflow.worker"] --> WFX["lead state + Aarya calls + emails"]
KQ --> KW["kpi.worker"] --> RC["Redis cache"]
CQ --> CW["cleanup.worker"] --> PG
User journeys¶
1. Transactional email (e.g. meeting confirmation)¶
A controller never sends mail synchronously — it enqueues a job and returns immediately. The email.worker
(concurrency 5, rate-limited to 10/sec) drains the queue.
sequenceDiagram
participant FE as Frontend
participant API as Meeting controller
participant QS as QueueService
participant R as Redis
participant EW as email.worker
participant SMTP as Gmail SMTP
FE->>API: confirm meeting
API->>QS: addEmailJob with type meeting-confirmation
QS->>R: enqueue sendEmail job on emailQueue
API-->>FE: 200 OK queued
R->>EW: reserve job when a slot is free
EW->>EW: switch on data.type then build template
EW->>SMTP: send mail
alt send succeeds
SMTP-->>EW: accepted
EW->>R: mark completed and keep last 50
else send throws
SMTP-->>EW: error
EW->>R: mark failed and keep last 50
Note over EW,R: no attempts configured so no auto-retry
end
2. Mr. Hire resume screening (multi-stage, self-chaining)¶
The screening pipeline is the most complex flow. A single application can pass through resume-analysis,
full-screening, ai-call, and ai-call-check — and the worker enqueues its own follow-up jobs rather than
blocking on a long AI phone call.
sequenceDiagram
participant API as Job-application API
participant QS as QueueService
participant R as Redis
participant RW as resumeAnalysis.worker
participant S3 as AWS S3
participant AI as LLM and AI-call provider
participant PG as PostgreSQL
API->>QS: addResumeAnalysisJob type full-screening
QS->>R: enqueue analyzeResume job
R->>RW: reserve job
RW->>S3: fetch resume by key
RW->>AI: score resume against job post
RW->>PG: write ScreeningResult and tier
alt rules say make AI call
RW->>AI: dispatch AI phone interview
RW->>QS: enqueue follow-up ai-call-check job
Note over RW: worker released instead of blocking 6 plus minutes
R->>RW: reserve ai-call-check later
RW->>AI: poll call result
alt result ready
RW->>PG: store communication score and transcript
else still running
RW->>QS: re-enqueue ai-call-check with incremented count
end
else no call needed
RW->>PG: finalize composite score
end
3. Scheduled KPI recompute (repeatable every 15 minutes)¶
sequenceDiagram
participant Boot as src index startServer
participant QS as QueueService
participant R as Redis
participant KW as kpi.worker
participant ADS as AdminDashboardService
participant PG as PostgreSQL
participant Cache as Redis cache
Boot->>QS: scheduleKpiCalculation
QS->>R: remove old repeatable then add two repeatable jobs every 15m
loop every 15 minutes
R->>KW: deliver dashboardKpiCalculation job
KW->>ADS: compute dashboard KPIs
ADS->>PG: aggregate queries
KW->>Cache: write cached KPI payload
R->>KW: deliver salesOverviewStatsCalculation job
KW->>Cache: write cached sales overview
end
4. Sales-automation workflow engine (scan + step + event)¶
The workflowQueue carries three job names. The 5-minute workflowScan enrolls newly-matching leads; each
enrollment then advances one node per workflowStep job, with delayed jobs used for Wait nodes; lead lifecycle
changes publish workflowEvent jobs for event-driven workflows.
sequenceDiagram
participant Boot as Boot scheduler
participant CRM as CRM lead change
participant QS as QueueService
participant R as Redis
participant WW as workflow.worker
participant ENG as WorkflowEngineService
participant PG as PostgreSQL
Boot->>QS: scheduleWorkflowScan every 5m
CRM->>QS: addWorkflowEventJob with leadIds
QS->>R: enqueue workflowScan and workflowEvent jobs
R->>WW: reserve workflowScan
WW->>ENG: runTriggerScan
ENG->>PG: find matching leads and create enrollments
ENG->>QS: addWorkflowStepJob per new enrollment
R->>WW: reserve workflowStep
WW->>ENG: processEnrollmentStep
alt node is a Wait step
ENG->>QS: re-enqueue workflowStep with delay
else node is an action
ENG->>PG: apply action and advance pointer
end
5. Aarya / Miss Ozone AI-call reconciliation (poll-back)¶
AI phone calls are dispatched elsewhere and reconciled by repeatable pollers, because the provider result is not available synchronously.
sequenceDiagram
participant Boot as Boot scheduler
participant QS as QueueService
participant R as Redis
participant AW as aaryaSync.worker
participant EL as ElevenLabs API
participant PG as PostgreSQL
Boot->>QS: scheduleAaryaSync every 15m
QS->>R: add repeatable aaryaCallSync job
loop every 15 minutes
R->>AW: deliver aaryaCallSync job
AW->>EL: list dispatched-but-unfinished conversations
alt conversation finished
EL-->>AW: duration and conversation id and transcript
AW->>PG: update LeadCallLog and RawLead predicted interest
else still in progress
Note over AW: leave for next tick
end
end
missOzone.worker follows the same pattern at a tighter cadence — poll-active-calls every 60 s reconciles
calls stuck in scheduled/dialing/completed against LiveKit, with an in-handler retry loop for evaluation fetches,
and a daily prune-old-calls at 03:30 that NULLs heavy transcript/eval columns and deletes long-failed rows.
6. External LMS / exam sync (per-config repeatable)¶
sequenceDiagram
participant Admin as Admin panel
participant Svc as MrLearn config service
participant QS as QueueService
participant R as Redis
participant SW as mrlearnSync.worker
participant GR as Graphy API
participant PG as PostgreSQL
Admin->>Svc: enable sync config with intervalHours
Svc->>QS: scheduleMrLearnSync configId intervalHours
QS->>R: add repeatable job with stable id mrlearn-sync configId
loop every intervalHours
R->>SW: deliver mrlearnSync job
SW->>GR: pull learners and progress
SW->>PG: upsert learner rows
end
Admin->>Svc: trigger manual sync now
Svc->>QS: triggerMrLearnSyncNow with fast flag
QS->>R: enqueue one-shot job
R->>SW: deliver and process immediately
On boot, src/index.ts re-reads MrLearnSyncConfig (enabled / reminderEnabled), MrTestSyncConfig (enabled),
and SystemConfig keys (MRLEARN_NEW_STUDENT_SYNC_ENABLED / ..._INTERVAL_HOURS) from Postgres and recreates
every Redis-side schedule, so DB config is always the source of truth even after a Redis flush.
Background jobs & async¶
Scheduling / cron timeline¶
All cron patterns use tz: 'Asia/Kolkata' (IST) unless noted; every intervals are wall-clock from registration.
| Cadence | Job(s) | Queue | Registered by |
|---|---|---|---|
| every 60 s | poll-active-calls |
missOzoneQueue |
scheduleMissOzonePolling |
| every 5 min | workflowScan |
workflowQueue |
scheduleWorkflowScan |
| every 15 min | slotCleanup |
cleanupQueue |
scheduleSlotCleanup |
| every 15 min | dashboardKpiCalculation + salesOverviewStatsCalculation |
kpiQueue |
scheduleKpiCalculation |
| every 15 min | aaryaCallSync |
aaryaSyncQueue |
scheduleAaryaSync |
| every 15 min | leadAutoAssignment (consumer disabled) |
leadAssignmentQueue |
scheduleLeadAutoAssignment |
19:00 IST (0 19 * * *) |
remindAssignmentsDue |
assignmentReminderQueue |
scheduleAssignmentReminders |
23:59 IST (59 23 * * *) |
dailyWarningProcessing |
warningQueue |
scheduleDailyWarningProcessing |
02:00 IST (0 2 * * *) |
backfillApplications |
databaseQueue |
scheduleApplicationBackfill |
03:30 (30 3 * * *) |
prune-old-calls |
missOzoneQueue |
scheduleMissOzonePrune |
00:00 IST (0 0 * * *) |
generateDailyCards |
dailyCardsQueue |
scheduleDailyCards |
| every 24 h | badgeEvaluation |
badgeEvaluationQueue |
scheduleBadgeEvaluation(24) |
| every 24 h | studentRiskComputation |
studentRiskComputationQueue |
scheduleStudentRiskComputation(24) |
| every 15 days | salaryBenchmarkRefresh |
salaryBenchmarkQueue |
scheduleSalaryBenchmarkRefresh |
per-config every intervalHours |
mrlearnSync / mrtestSync |
mrlearnSyncQueue / mrtestSyncQueue |
scheduleMrLearnSync / scheduleMrTestSync |
per-config every reminderIntervalHours |
mrlearnReminder |
mrlearnReminderQueue |
scheduleMrLearnReminder |
SystemConfig-driven every Nh |
mrlearnNewStudentSync |
mrlearnNewStudentSyncQueue |
scheduleMrLearnNewStudentSync |
flowchart LR
T60["60s: missOzone poll"]
T5["5m: workflow scan"]
T15["15m: cleanup, kpi x2, aaryaSync, leadAssign"]
D1["daily: dailyCards 00:00, backfill 02:00, prune 03:30, assignReminder 19:00, warning 23:59"]
H24["24h: badges, student risk"]
D15["15d: salary benchmark"]
CFG["per-config: mrlearn/mrtest sync, mrlearn reminder, new-student sync"]
Two scheduling styles coexist: repeat: { every: ms } (fixed interval from registration) and
repeat: { pattern: cron, tz } (wall-clock cron). Per-config and per-flag schedules use a stable jobId
(e.g. mrlearn-sync-<configId>) so re-registration replaces rather than duplicates.
Worker concurrency & retention summary¶
| Property | Value(s) | Where |
|---|---|---|
| Concurrency 5 | email, whatsapp, resumeAnalysis, workflow |
each worker opts.concurrency |
| Concurrency 4 | missOzone |
missOzone.worker |
| Concurrency 3 | database |
database.worker |
| Concurrency 1 | all remaining workers | their worker files |
| Rate limiter | emailQueue only — max 10 per 1000 ms |
email.worker |
| Retention | removeOnComplete / removeOnFail ranging 10-200 per queue |
QueueService defaultJobOptions |
| Retries / backoff | none configured (attempts defaults to 1) |
absent from all defaultJobOptions |
Socket events & webhooks¶
This domain has no Socket.IO events or inbound webhooks of its own. (Socket.IO lives in src/socket.ts for
meetings; provider callbacks for Exotel/Razorpay/Leegality are HTTP routes owned by their feature docs.) AI-call
results are obtained by polling workers (aaryaSync, missOzone, resumeAnalysis ai-call-check), not by webhook.
External integrations¶
| System | Used by worker(s) | Env / config | Failure / fallback |
|---|---|---|---|
| Redis | all queues + workers | REDIS_HOST, REDIS_PORT (defaults localhost:6379); password/db commented out |
RedisService uses lazyConnect + reconnect-on-READONLY; if Redis is down at boot, initializeRedis() throws and the process exits. |
| Gmail SMTP (Nodemailer) | email.worker |
EMAIL_USER, EMAIL_PASS |
Auth failures (535) throw → job marked failed (no auto-retry); see Memory note on rotating the app password and replaying stuck jobs. |
| ElevenLabs (Aarya) | aaryaSync.worker, resumeAnalysis.worker (ai-call) |
provider keys (feature env) | Single-flight concurrency: 1 on aaryaSync to avoid hammering; unfinished conversations are simply re-polled next tick. |
| LiveKit (Miss Ozone) | missOzone.worker |
LiveKit base/API env | In-handler retry loop for evaluation fetch (gives up after N attempts and logs); stuck calls reconciled every 60 s. |
| Graphy / EzExam | mrlearnSync, mrlearnReminder, mrlearnNewStudentSync, mrtestSync |
per-config rows in Postgres (MrLearnSyncConfig, MrTestSyncConfig) + SystemConfig flags |
Per-config schedules re-derived from DB on boot; a failing sync fails its job only. |
whatsapp.worker, mrlearnReminder.worker |
WhatsApp provider env | Unknown job type throws (job fails). |
|
| AWS S3 | resumeAnalysis.worker |
AWS_* env |
Used to read resume objects; fetch errors fail the job. |
| Bull Board UI | dashboard | BULL_BOARD_USERNAME, BULL_BOARD_PASSWORD ( (credentials set via env vars)) |
Basic-auth only — change credentials in any non-trivial environment. |
Feature flags / conditional schedules:
MRLEARN_NEW_STUDENT_SYNC_ENABLED(SystemConfig) toggles the global new-student sync; when false the boot code callsunscheduleMrLearnNewStudentSync().MrLearnSyncConfig.enabled/.reminderEnabledandMrTestSyncConfig.enabledgate the per-config schedules.leadAssignment.workerimport is commented out insrc/index.ts— the schedule exists but is unconsumed (inferred dead path).
Status lifecycles¶
BullMQ job lifecycle¶
Because no queue configures attempts/backoff, the retry branch below is available in BullMQ but not
exercised by this codebase's default options — a failed handler goes straight to failed and is retained per
removeOnFail. (Workers that "retry" do so with their own in-handler loops or by re-enqueueing follow-up jobs.)
stateDiagram-v2
[*] --> waiting: queue.add
waiting --> delayed: job has delay or repeat
delayed --> waiting: delay elapses
waiting --> active: worker reserves job
active --> completed: handler resolves
active --> failed: handler throws
failed --> waiting: only if attempts remain
completed --> [*]: trimmed by removeOnComplete
failed --> [*]: trimmed by removeOnFail
Repeatable-job registration lifecycle¶
Every schedule*() helper follows the same idempotent pattern so boots never stack duplicate schedules.
stateDiagram-v2
[*] --> Scanning: schedule called on boot
Scanning --> Removing: getRepeatableJobs finds prior key
Removing --> Adding: removeRepeatableByKey
Scanning --> Adding: no prior key
Adding --> Scheduled: add with repeat and stable jobId
Scheduled --> [*]
Scheduled --> Removing: unschedule called for disabled config
Edge cases, limits & gotchas¶
- No automatic retries. No queue sets
attemptsorbackoff, so the effective attempt count is 1. A transient failure (SMTP blip, provider timeout) drops the job tofailedpermanently; recovery is manual via Bull Board "retry" or a re-enqueue. Workers needing resilience implement their own loops (missOzoneevaluation fetch) or re-enqueue follow-ups (resumeAnalysisai-call-check). - Cleanup comment vs reality.
scheduleSlotCleanupis documented as "every 24 hours" but actually registersevery: 15 * 60 * 1000(15 minutes). Trust the code. leadAssignmentQueuehas a producer + schedule but no consumer — the worker import is commented out insrc/index.ts. ScheduledleadAutoAssignmentjobs will pile up inwaiting(inferred).notificationQueuehas a producer (addNotificationJob) and appears in Bull Board but has no worker on this branch (inferred).- Bull Board visibility gap.
getQueues()returns 19 of 22 queues;warningQueue,badgeEvaluationQueue, andstudentRiskComputationQueueare not shown in the dashboard. Their workers still run. - Idempotency relies on stable
jobIds. Per-config/global schedules (mrlearn-sync-<id>,badge-evaluation-daily,student-risk-daily,mrlearn-new-student-sync,workflow-scan) replace rather than duplicate on re-registration. One-shot manual jobs do not dedupe — repeated manual triggers enqueue repeated jobs. - Each worker opens its own Redis connection (inline host/port), separate from the cache
RedisService. There is no shared connection pool; a Redis outage affects producers and all workers alike. - Workers initialise the DB lazily. Most workers call
DatabaseService.getInstance().initialize()and guard the handler with anif (!service) throwuntil the connection is ready. A job that arrives in that startup window throws and fails. - Hardening note (internal). A security/hardening observation for this area is tracked in the team's private notes (
internal/security-and-hardening-notes.md) and is intentionally not published on this site. - Timezone. Cron schedules pin
tz: 'Asia/Kolkata';every-based schedules are timezone-agnostic wall-clock intervals from the moment of registration (i.e. effectively from each boot). - Graceful shutdown (
SIGINT/SIGTERMinsrc/index.ts) closes Socket.IO, the HTTP server, the DataSource, and the cache Redis client — it does not explicitlyclose()the BullMQ workers, so in-flight jobs may be interrupted rather than drained (inferred).
Related docs¶
- Architecture overview
- Email & notifications
- Mr. Hire — resume analysis & AI screening
- Sales CRM & workflow automation
- AI voice calling — Aarya & Miss Ozone
- Mr. Learn / Mr. Test external sync
- LMS — engagement, badges & student risk
- Meetings & slot lifecycle
- Admin dashboards & KPIs
- Data model overview
- DevOps & deployment