Background Jobs & Queues (Redis + BullMQ)¶

All work in the Mr. Mentor backend that must not block an HTTP request — sending email, AI resume screening, recomputing dashboard KPIs, syncing external LMS/exam platforms, driving the sales-automation workflow engine, reconciling AI phone calls — is pushed onto a BullMQ queue backed by a single Redis instance and consumed by one of ~21 long-lived Workers started at boot. This document is the canonical map of every queue, every worker, the producer→queue→worker→side-effect flow, the scheduling/cron timeline, the job lifecycle, and the Bull Board monitoring UI.

Status: documented from source on this branch.

Overview¶

What it does. Offloads slow, retryable, or scheduled work from the request path onto Redis-backed job queues. Producers (controllers, services, socket handlers, other workers) enqueue jobs; workers process them asynchronously with their own concurrency and rate limits.
Who uses it. Indirectly everyone — every product surface in the suite relies on at least one queue (Mr. Mentor meetings → email + cleanup, Mr. Hire → resumeAnalysis, LMS → daily-cards/badges/warnings/risk, Sales CRM → workflow/leadAssignment/whatsapp/aaryaSync, Finance → email invoices). Operators (ADMIN / SUPERADMIN) observe and replay jobs through Bull Board.
Where it sits. A cross-cutting infrastructure layer. The two singletons are src/services/QueueService.ts (the producer side — owns all Queue objects and add*/schedule* helpers) and src/config/redis.ts (RedisService, a general-purpose Redis client also used for caching). Workers live in src/workers/* and are dynamically imported during the boot sequence in src/index.ts.

Key facts to keep in mind throughout:

One Redis, one logical DB. Every queue and the cache share REDIS_HOST / REDIS_PORT (defaults localhost:6379). Password / DB-index lines are commented out in both QueueService.ts and redis.ts, so auth is currently off.
No BullMQ-level retries are configured. No queue sets attempts or backoff in its defaultJobOptions, so the effective attempts is 1 — a thrown error moves the job straight to failed (see Edge cases). Some workers implement their own in-handler retry loops.
Repeatable jobs are re-registered on every boot with stable jobIds, so a config change or a Redis flush is always reconciled at startup (src/index.ts).

Key concepts & entities¶

This domain owns infrastructure, not TypeORM entities — its "data" is Redis keys (BullMQ job hashes, wait/active/completed/failed lists, repeatable-job schedules). It does, however, read and write many business entities through the services each worker calls.

Concept	Meaning
Queue	A named BullMQ list in Redis (e.g. `emailQueue`). Created in `QueueService`'s constructor.
Job	A unit of work added with `queue.add(name, data, options)`. Carries a `name`, a `data` payload, and options.
Worker	A long-lived consumer bound to one queue name (`new Worker('emailQueue', processor, opts)`). One file per worker in `src/workers/`.
Repeatable job	A job with `repeat: { every }` or `repeat: { pattern }` (cron). BullMQ re-enqueues it on schedule. Registered via `schedule*()` helpers.
Stable jobId	A fixed `jobId` (e.g. `badge-evaluation-daily`, `mrlearn-sync-<configId>`) so re-registering replaces the prior schedule instead of stacking duplicates.
`defaultJobOptions`	Per-queue defaults — here only `removeOnComplete` / `removeOnFail` (how many finished jobs to retain for inspection).
`concurrency`	How many jobs a worker processes in parallel. Most are `1`; high-throughput queues use 3-5.
`limiter`	Rate cap. Only `emailQueue` sets one (`max: 10` per `1000` ms).

Relevant source files:

src/services/QueueService.ts — singleton producer + all queue definitions + all schedulers (1144 lines).
src/config/redis.ts — RedisService singleton (ioredis), lazyConnect, reconnect-on-READONLY.
src/workers/*.worker.ts — 21 worker files.
src/routes/bullBoard.routes.ts — Bull Board dashboard, basic-auth protected.
src/types/EmailQueue.types.ts, src/types/DatabaseQueue.types.ts, src/types/ResumeQueue.types.ts — job payload contracts.

Architecture¶

flowchart TD
  subgraph PROD["Producers"]
    CTRL["Controllers / Services / Socket handlers"]
    SCHED["Boot schedulers (src/index.ts)"]
    WRK2WRK["Worker-to-worker enqueues"]
  end

  QS["QueueService singleton (src/services/QueueService.ts)"]
  REDIS[("Redis (BullMQ + cache)")]

  subgraph WORKERS["Workers (src/workers)"]
    EMAIL["email.worker (x5)"]
    DB["database.worker (x3)"]
    RESUME["resumeAnalysis.worker (x5)"]
    WF["workflow.worker (x5)"]
    OTHERS["cleanup / kpi / warning / daily-cards / badges / risk / sync workers (x1)"]
  end

  subgraph SIDE["Side effects"]
    SMTP["Nodemailer / Gmail SMTP"]
    PG[("PostgreSQL via TypeORM")]
    S3["AWS S3"]
    EXT["ElevenLabs / LiveKit / Graphy / EzExam / WhatsApp / Exotel"]
    CACHE["Redis cache keys (KPIs, stats)"]
  end

  CTRL -->|"addEmailJob / addResumeAnalysisJob / addWorkflowStepJob"| QS
  SCHED -->|"schedule* repeatable jobs"| QS
  WRK2WRK --> QS
  QS -->|"queue.add"| REDIS
  REDIS -->|"reserve job"| EMAIL
  REDIS --> DB
  REDIS --> RESUME
  REDIS --> WF
  REDIS --> OTHERS

  EMAIL --> SMTP
  DB --> PG
  RESUME --> PG
  RESUME --> S3
  RESUME --> EXT
  WF --> PG
  WF --> EXT
  OTHERS --> PG
  OTHERS --> CACHE
  OTHERS --> EXT

Each worker opens its own Redis connection inline (it does not reuse RedisService). The producer side is a single QueueService instance obtained via QueueService.getInstance().

Data model¶

There are no SQL tables for this domain. The "schema" is the set of Redis structures BullMQ maintains per queue, plus the job payload shapes declared in src/types/*Queue.types.ts.

erDiagram
  QUEUE ||--o{ JOB : "holds"
  QUEUE ||--o{ REPEATABLE : "schedules"
  JOB ||--|| PAYLOAD : "carries"
  WORKER }o--|| QUEUE : "consumes"

  QUEUE {
    string name PK
    int removeOnComplete
    int removeOnFail
  }
  JOB {
    string id PK
    string name
    string state
    int attemptsMade
    json data
  }
  REPEATABLE {
    string jobId PK
    string everyOrPattern
    string tz
  }
  WORKER {
    string queueName PK
    int concurrency
    json limiter
  }
  PAYLOAD {
    string type
    json fields
  }

Representative payload contracts:

Email (EmailJobData, src/types/EmailQueue.types.ts): a discriminated union over type (meeting-approval, payment-invoice, crm-lead-email, workflow-email, welcome-email, ~30 variants).
Database (DatabaseJobData, src/types/DatabaseQueue.types.ts): create-token, update-mentor-rating, updateMentorMultiplier, deleteUser, backfill-applications.
Resume (ResumeAnalysisJobData, src/types/ResumeQueue.types.ts): type of resume-analysis / full-screening / ai-call plus applicationId, resumeUrl/resumeKey, jobPostId, authToken.

API surface¶

The only HTTP surface for this domain is the Bull Board dashboard. It is constructed in src/routes/bullBoard.routes.ts and mounted in src/routes/index.ts (this.router.use('/', this.bullBoardRoutes.router)), which is itself mounted at / by src/app.ts. Inside BullBoardRoutes the express adapter base path is /admin/queues and the router guards it with HTTP Basic auth.

Method	Path	Auth/role	Purpose
GET	`/admin/queues`	HTTP Basic (`BULL_BOARD_USERNAME` / `BULL_BOARD_PASSWORD` (credentials set via env vars))	Bull Board UI — list queues, inspect jobs
GET/POST	`/admin/queues/api/*`	HTTP Basic	Bull Board internal API (job lists, retry, promote, clean) used by the UI

Notes:

Bull Board registers 19 of the 22 queues (see QueueService.getQueues()). warningQueue, badgeEvaluationQueue, and studentRiskComputationQueue are not in the Bull Board list and so are not visible in the dashboard, even though their workers run.
This is not under /api and does not use the JWT authMiddleware; it has its own Basic-auth gate.
There is no JSON REST API to enqueue jobs directly — production happens in-process via QueueService methods called from controllers/services. Several services also expose admin endpoints that internally call triggerMrLearnSyncNow, addWorkflowScanJob, addAaryaSyncJob, etc. (those endpoints belong to their feature docs).

Workers & queues — the master table¶

Concurrency / retention values are read from each worker file and QueueService defaultJobOptions. "Trigger/Schedule" lists how jobs reach the queue.

Worker (`src/workers/`)	Queue	Trigger / Schedule	Purpose
`email.worker` (conc 5, limiter 10/sec)	`emailQueue`	On demand via `addEmailJob`	Transactional email for ~30 `type`s — meeting approve/confirm/expire, onboarding (mentor/admin/sales/student/HR/CM/super-mentor), password-changed, role-changed, payment invoice, quiz invite, discount approved, batch-meeting, CRM lead email, telecaller assignment. Sends via Nodemailer/Gmail.
`whatsapp.worker` (conc 5)	`whatsappQueue`	On demand via `addWhatsAppJob`	WhatsApp sends. Currently handles `APPLICATION_MEETING_REMINDER` through `ApplicationMeetingReminderService`.
`database.worker` (conc 3)	`databaseQueue`	On demand + cron `0 2 * * *` IST	Heavy DB ops off the request path: `create-token`, `update-mentor-rating`, `updateMentorMultiplier`, `deleteUser`, and the scheduled `backfill-applications` job.
`cleanup.worker` (conc 1)	`cleanupQueue`	Repeatable `every 15m` (`scheduleSlotCleanup`)	Expire stale/unbooked slots via `SlotCleanupService`. (Doc comment says 24h; code is 15 min.)
`kpi.worker` (conc 1)	`kpiQueue`	Repeatable `every 15m` (two jobs)	Recompute + cache dashboard KPIs (`dashboardKpiCalculation`) and sales overview stats (`salesOverviewStatsCalculation`) into Redis via `AdminDashboardService`.
`resumeAnalysis.worker` (conc 5)	`resumeAnalysisQueue`	On demand via `addResumeAnalysisJob`	Mr. Hire screening pipeline: `resume-analysis` / `full-screening` (LLM scoring), `ai-call` (dispatch AI phone interview), `ai-call-check` (poll result). Re-enqueues follow-up jobs onto itself.
`salaryBenchmark.worker` (conc 1)	`salaryBenchmarkQueue`	Repeatable `every 15 days`	Refresh salary benchmark data via `SalaryBenchmarkService`.
`warning.worker` (conc 1)	`warningQueue`	Cron `59 23 * * *` IST + on demand	Process the day's attendance absences and create student warnings.
`leadAssignment.worker` (conc 1)	`leadAssignmentQueue`	Repeatable `every 15m` scheduled — but worker import is commented out in `index.ts`	Round-robin auto-assign unassigned CRM leads via `SalesDashboardService`. Schedule is registered yet no consumer runs, so jobs accumulate (inferred).
`dailyCards` (`daily-cards.worker`, conc 1)	`dailyCardsQueue`	Cron `0 0 * * *` IST	Generate per-student daily engagement/gamification cards (Redis-backed).
`course-plan.worker` (conc 1)	`coursePlanQueue`	On demand	Generate course-plan / lecture content "one lecture at a time".
`missOzone.worker` (conc 4)	`missOzoneQueue`	Repeatable `every 60s` (`poll-active-calls`) + cron `30 3 * * *` (`prune-old-calls`) + on demand	AI voice-mentor (Miss Ozone) call reconciliation against LiveKit, evaluation fetch with in-handler retries, and daily retention prune of transcript/eval blobs.
`mrlearnSync.worker` (conc 1)	`mrlearnSyncQueue`	Per-config repeatable `every intervalHours` + manual	Sync learners/progress from Graphy (Mr. Learn) per `MrLearnSyncConfig`. Supports `fast` incremental mode.
`mrlearnReminder.worker` (conc 1)	`mrlearnReminderQueue`	Per-config repeatable `every reminderIntervalHours` + manual	Send WhatsApp course-progress nudges to learners below a progress threshold.
`mrlearnNewStudentSync.worker` (conc 1)	`mrlearnNewStudentSyncQueue`	Global repeatable, `SystemConfig`-driven interval	Fan out a fast new-enrolment sync across all enabled `MrLearnSyncConfig`s.
`mrtestSync.worker` (conc 1)	`mrtestSyncQueue`	Per-config repeatable `every intervalHours` + manual	Sync exam/test data from EzExam (Mr. Test) per `MrTestSyncConfig`.
`badgeEvaluation.worker` (conc 1)	`badgeEvaluationQueue`	Repeatable `every 24h`	Safety-net cron re-running `BadgeService.evaluateForUser` for enrolled students (catches badges like `profile_pro` whose events skip `grantXp`).
`studentRiskComputation.worker` (conc 1)	`studentRiskComputationQueue`	Repeatable `every 24h`	Walk active batches and write fresh `student_risk_scores` rows via `StudentRiskService` (BL "Students at Risk" card).
`assignmentReminder.worker` (conc 1)	`assignmentReminderQueue`	Cron `0 19 * * *` IST	Notify students of assignments due in the next 24h (`assignment_due_24h` template, subject to a 3/day cap).
`aaryaSync.worker` (conc 1)	`aaryaSyncQueue`	Repeatable `every 15m` + on demand	Poll ElevenLabs for dispatched Aarya calls; write back duration + `conversation_id` + predicted lead interest (`AaryaCallSyncService`).
`workflow.worker` (conc 5)	`workflowQueue`	Repeatable scan `every 5m` + event/step/wake enqueues	Sales-automation engine: `workflowScan` (enroll matching leads), `workflowStep` (advance one enrollment by one node, incl. delayed Wait steps), `workflowEvent` (react to lead lifecycle events).
(no dedicated worker)	`notificationQueue`	`addNotificationJob`	Declared and exposed in Bull Board, but no `notification.worker` consumes it on this branch (inferred — notifications are largely handled inline / via email + whatsapp queues).

Producer→queue→worker→side-effects, for the busiest queues:

flowchart LR
  subgraph P["Producers"]
    MC["Meeting / Auth / Payment controllers"]
    HC["Mr. Hire job-application flow"]
    SC["Sales CRM lead events"]
    CR["Boot cron schedulers"]
  end

  MC -->|"addEmailJob"| EQ["emailQueue"]
  MC -->|"addDatabaseJob"| DQ["databaseQueue"]
  HC -->|"addResumeAnalysisJob"| RQ["resumeAnalysisQueue"]
  SC -->|"addWorkflowEventJob / addWorkflowStepJob"| WQ["workflowQueue"]
  CR -->|"every 15m"| KQ["kpiQueue"]
  CR -->|"every 15m"| CQ["cleanupQueue"]

  EQ --> EW["email.worker"] --> SMTP["Gmail SMTP"]
  DQ --> DW["database.worker"] --> PG[("PostgreSQL")]
  RQ --> RW["resumeAnalysis.worker"] --> LLM["LLM + AI call + S3"]
  WQ --> WW["workflow.worker"] --> WFX["lead state + Aarya calls + emails"]
  KQ --> KW["kpi.worker"] --> RC["Redis cache"]
  CQ --> CW["cleanup.worker"] --> PG

User journeys¶

1. Transactional email (e.g. meeting confirmation)¶

A controller never sends mail synchronously — it enqueues a job and returns immediately. The email.worker (concurrency 5, rate-limited to 10/sec) drains the queue.

sequenceDiagram
  participant FE as Frontend
  participant API as Meeting controller
  participant QS as QueueService
  participant R as Redis
  participant EW as email.worker
  participant SMTP as Gmail SMTP

  FE->>API: confirm meeting
  API->>QS: addEmailJob with type meeting-confirmation
  QS->>R: enqueue sendEmail job on emailQueue
  API-->>FE: 200 OK queued
  R->>EW: reserve job when a slot is free
  EW->>EW: switch on data.type then build template
  EW->>SMTP: send mail
  alt send succeeds
    SMTP-->>EW: accepted
    EW->>R: mark completed and keep last 50
  else send throws
    SMTP-->>EW: error
    EW->>R: mark failed and keep last 50
    Note over EW,R: no attempts configured so no auto-retry
  end

2. Mr. Hire resume screening (multi-stage, self-chaining)¶

The screening pipeline is the most complex flow. A single application can pass through resume-analysis, full-screening, ai-call, and ai-call-check — and the worker enqueues its own follow-up jobs rather than blocking on a long AI phone call.

sequenceDiagram
  participant API as Job-application API
  participant QS as QueueService
  participant R as Redis
  participant RW as resumeAnalysis.worker
  participant S3 as AWS S3
  participant AI as LLM and AI-call provider
  participant PG as PostgreSQL

  API->>QS: addResumeAnalysisJob type full-screening
  QS->>R: enqueue analyzeResume job
  R->>RW: reserve job
  RW->>S3: fetch resume by key
  RW->>AI: score resume against job post
  RW->>PG: write ScreeningResult and tier
  alt rules say make AI call
    RW->>AI: dispatch AI phone interview
    RW->>QS: enqueue follow-up ai-call-check job
    Note over RW: worker released instead of blocking 6 plus minutes
    R->>RW: reserve ai-call-check later
    RW->>AI: poll call result
    alt result ready
      RW->>PG: store communication score and transcript
    else still running
      RW->>QS: re-enqueue ai-call-check with incremented count
    end
  else no call needed
    RW->>PG: finalize composite score
  end

3. Scheduled KPI recompute (repeatable every 15 minutes)¶

sequenceDiagram
  participant Boot as src index startServer
  participant QS as QueueService
  participant R as Redis
  participant KW as kpi.worker
  participant ADS as AdminDashboardService
  participant PG as PostgreSQL
  participant Cache as Redis cache

  Boot->>QS: scheduleKpiCalculation
  QS->>R: remove old repeatable then add two repeatable jobs every 15m
  loop every 15 minutes
    R->>KW: deliver dashboardKpiCalculation job
    KW->>ADS: compute dashboard KPIs
    ADS->>PG: aggregate queries
    KW->>Cache: write cached KPI payload
    R->>KW: deliver salesOverviewStatsCalculation job
    KW->>Cache: write cached sales overview
  end

4. Sales-automation workflow engine (scan + step + event)¶

The workflowQueue carries three job names. The 5-minute workflowScan enrolls newly-matching leads; each enrollment then advances one node per workflowStep job, with delayed jobs used for Wait nodes; lead lifecycle changes publish workflowEvent jobs for event-driven workflows.

sequenceDiagram
  participant Boot as Boot scheduler
  participant CRM as CRM lead change
  participant QS as QueueService
  participant R as Redis
  participant WW as workflow.worker
  participant ENG as WorkflowEngineService
  participant PG as PostgreSQL

  Boot->>QS: scheduleWorkflowScan every 5m
  CRM->>QS: addWorkflowEventJob with leadIds
  QS->>R: enqueue workflowScan and workflowEvent jobs
  R->>WW: reserve workflowScan
  WW->>ENG: runTriggerScan
  ENG->>PG: find matching leads and create enrollments
  ENG->>QS: addWorkflowStepJob per new enrollment
  R->>WW: reserve workflowStep
  WW->>ENG: processEnrollmentStep
  alt node is a Wait step
    ENG->>QS: re-enqueue workflowStep with delay
  else node is an action
    ENG->>PG: apply action and advance pointer
  end

5. Aarya / Miss Ozone AI-call reconciliation (poll-back)¶

AI phone calls are dispatched elsewhere and reconciled by repeatable pollers, because the provider result is not available synchronously.

sequenceDiagram
  participant Boot as Boot scheduler
  participant QS as QueueService
  participant R as Redis
  participant AW as aaryaSync.worker
  participant EL as ElevenLabs API
  participant PG as PostgreSQL

  Boot->>QS: scheduleAaryaSync every 15m
  QS->>R: add repeatable aaryaCallSync job
  loop every 15 minutes
    R->>AW: deliver aaryaCallSync job
    AW->>EL: list dispatched-but-unfinished conversations
    alt conversation finished
      EL-->>AW: duration and conversation id and transcript
      AW->>PG: update LeadCallLog and RawLead predicted interest
    else still in progress
      Note over AW: leave for next tick
    end
  end

missOzone.worker follows the same pattern at a tighter cadence — poll-active-calls every 60 s reconciles calls stuck in scheduled/dialing/completed against LiveKit, with an in-handler retry loop for evaluation fetches, and a daily prune-old-calls at 03:30 that NULLs heavy transcript/eval columns and deletes long-failed rows.

6. External LMS / exam sync (per-config repeatable)¶

sequenceDiagram
  participant Admin as Admin panel
  participant Svc as MrLearn config service
  participant QS as QueueService
  participant R as Redis
  participant SW as mrlearnSync.worker
  participant GR as Graphy API
  participant PG as PostgreSQL

  Admin->>Svc: enable sync config with intervalHours
  Svc->>QS: scheduleMrLearnSync configId intervalHours
  QS->>R: add repeatable job with stable id mrlearn-sync configId
  loop every intervalHours
    R->>SW: deliver mrlearnSync job
    SW->>GR: pull learners and progress
    SW->>PG: upsert learner rows
  end
  Admin->>Svc: trigger manual sync now
  Svc->>QS: triggerMrLearnSyncNow with fast flag
  QS->>R: enqueue one-shot job
  R->>SW: deliver and process immediately

On boot, src/index.ts re-reads MrLearnSyncConfig (enabled / reminderEnabled), MrTestSyncConfig (enabled), and SystemConfig keys (MRLEARN_NEW_STUDENT_SYNC_ENABLED / ..._INTERVAL_HOURS) from Postgres and recreates every Redis-side schedule, so DB config is always the source of truth even after a Redis flush.

Background jobs & async¶

Scheduling / cron timeline¶

All cron patterns use tz: 'Asia/Kolkata' (IST) unless noted; every intervals are wall-clock from registration.

Cadence	Job(s)	Queue	Registered by
every 60 s	`poll-active-calls`	`missOzoneQueue`	`scheduleMissOzonePolling`
every 5 min	`workflowScan`	`workflowQueue`	`scheduleWorkflowScan`
every 15 min	`slotCleanup`	`cleanupQueue`	`scheduleSlotCleanup`
every 15 min	`dashboardKpiCalculation` + `salesOverviewStatsCalculation`	`kpiQueue`	`scheduleKpiCalculation`
every 15 min	`aaryaCallSync`	`aaryaSyncQueue`	`scheduleAaryaSync`
every 15 min	`leadAutoAssignment` (consumer disabled)	`leadAssignmentQueue`	`scheduleLeadAutoAssignment`
19:00 IST (`0 19 * * *`)	`remindAssignmentsDue`	`assignmentReminderQueue`	`scheduleAssignmentReminders`
23:59 IST (`59 23 * * *`)	`dailyWarningProcessing`	`warningQueue`	`scheduleDailyWarningProcessing`
02:00 IST (`0 2 * * *`)	`backfillApplications`	`databaseQueue`	`scheduleApplicationBackfill`
03:30 (`30 3 * * *`)	`prune-old-calls`	`missOzoneQueue`	`scheduleMissOzonePrune`
00:00 IST (`0 0 * * *`)	`generateDailyCards`	`dailyCardsQueue`	`scheduleDailyCards`
every 24 h	`badgeEvaluation`	`badgeEvaluationQueue`	`scheduleBadgeEvaluation(24)`
every 24 h	`studentRiskComputation`	`studentRiskComputationQueue`	`scheduleStudentRiskComputation(24)`
every 15 days	`salaryBenchmarkRefresh`	`salaryBenchmarkQueue`	`scheduleSalaryBenchmarkRefresh`
per-config `every intervalHours`	`mrlearnSync` / `mrtestSync`	`mrlearnSyncQueue` / `mrtestSyncQueue`	`scheduleMrLearnSync` / `scheduleMrTestSync`
per-config `every reminderIntervalHours`	`mrlearnReminder`	`mrlearnReminderQueue`	`scheduleMrLearnReminder`
`SystemConfig`-driven `every Nh`	`mrlearnNewStudentSync`	`mrlearnNewStudentSyncQueue`	`scheduleMrLearnNewStudentSync`

flowchart LR
  T60["60s: missOzone poll"]
  T5["5m: workflow scan"]
  T15["15m: cleanup, kpi x2, aaryaSync, leadAssign"]
  D1["daily: dailyCards 00:00, backfill 02:00, prune 03:30, assignReminder 19:00, warning 23:59"]
  H24["24h: badges, student risk"]
  D15["15d: salary benchmark"]
  CFG["per-config: mrlearn/mrtest sync, mrlearn reminder, new-student sync"]

Two scheduling styles coexist: repeat: { every: ms } (fixed interval from registration) and repeat: { pattern: cron, tz } (wall-clock cron). Per-config and per-flag schedules use a stable jobId (e.g. mrlearn-sync-<configId>) so re-registration replaces rather than duplicates.

Worker concurrency & retention summary¶

Property	Value(s)	Where
Concurrency 5	`email`, `whatsapp`, `resumeAnalysis`, `workflow`	each worker `opts.concurrency`
Concurrency 4	`missOzone`	`missOzone.worker`
Concurrency 3	`database`	`database.worker`
Concurrency 1	all remaining workers	their worker files
Rate limiter	`emailQueue` only — `max 10` per `1000 ms`	`email.worker`
Retention	`removeOnComplete` / `removeOnFail` ranging 10-200 per queue	`QueueService` `defaultJobOptions`
Retries / backoff	none configured (`attempts` defaults to 1)	absent from all `defaultJobOptions`

Socket events & webhooks¶

This domain has no Socket.IO events or inbound webhooks of its own. (Socket.IO lives in src/socket.ts for meetings; provider callbacks for Exotel/Razorpay/Leegality are HTTP routes owned by their feature docs.) AI-call results are obtained by polling workers (aaryaSync, missOzone, resumeAnalysis ai-call-check), not by webhook.

External integrations¶

System	Used by worker(s)	Env / config	Failure / fallback
Redis	all queues + workers	`REDIS_HOST`, `REDIS_PORT` (defaults `localhost:6379`); password/db commented out	`RedisService` uses `lazyConnect` + reconnect-on-`READONLY`; if Redis is down at boot, `initializeRedis()` throws and the process exits.
Gmail SMTP (Nodemailer)	`email.worker`	`EMAIL_USER`, `EMAIL_PASS`	Auth failures (535) throw → job marked failed (no auto-retry); see Memory note on rotating the app password and replaying stuck jobs.
ElevenLabs (Aarya)	`aaryaSync.worker`, `resumeAnalysis.worker` (`ai-call`)	provider keys (feature env)	Single-flight `concurrency: 1` on aaryaSync to avoid hammering; unfinished conversations are simply re-polled next tick.
LiveKit (Miss Ozone)	`missOzone.worker`	LiveKit base/API env	In-handler retry loop for evaluation fetch (gives up after N attempts and logs); stuck calls reconciled every 60 s.
Graphy / EzExam	`mrlearnSync`, `mrlearnReminder`, `mrlearnNewStudentSync`, `mrtestSync`	per-config rows in Postgres (`MrLearnSyncConfig`, `MrTestSyncConfig`) + `SystemConfig` flags	Per-config schedules re-derived from DB on boot; a failing sync fails its job only.
WhatsApp	`whatsapp.worker`, `mrlearnReminder.worker`	WhatsApp provider env	Unknown job `type` throws (job fails).
AWS S3	`resumeAnalysis.worker`	`AWS_*` env	Used to read resume objects; fetch errors fail the job.
Bull Board UI	dashboard	`BULL_BOARD_USERNAME`, `BULL_BOARD_PASSWORD` ( (credentials set via env vars))	Basic-auth only — change credentials in any non-trivial environment.

Feature flags / conditional schedules:

MRLEARN_NEW_STUDENT_SYNC_ENABLED (SystemConfig) toggles the global new-student sync; when false the boot code calls unscheduleMrLearnNewStudentSync().
MrLearnSyncConfig.enabled / .reminderEnabled and MrTestSyncConfig.enabled gate the per-config schedules.
leadAssignment.worker import is commented out in src/index.ts — the schedule exists but is unconsumed (inferred dead path).

Status lifecycles¶

BullMQ job lifecycle¶

Because no queue configures attempts/backoff, the retry branch below is available in BullMQ but not exercised by this codebase's default options — a failed handler goes straight to failed and is retained per removeOnFail. (Workers that "retry" do so with their own in-handler loops or by re-enqueueing follow-up jobs.)

stateDiagram-v2
  [*] --> waiting: queue.add
  waiting --> delayed: job has delay or repeat
  delayed --> waiting: delay elapses
  waiting --> active: worker reserves job
  active --> completed: handler resolves
  active --> failed: handler throws
  failed --> waiting: only if attempts remain
  completed --> [*]: trimmed by removeOnComplete
  failed --> [*]: trimmed by removeOnFail

Repeatable-job registration lifecycle¶

Every schedule*() helper follows the same idempotent pattern so boots never stack duplicate schedules.

stateDiagram-v2
  [*] --> Scanning: schedule called on boot
  Scanning --> Removing: getRepeatableJobs finds prior key
  Removing --> Adding: removeRepeatableByKey
  Scanning --> Adding: no prior key
  Adding --> Scheduled: add with repeat and stable jobId
  Scheduled --> [*]
  Scheduled --> Removing: unschedule called for disabled config

Edge cases, limits & gotchas¶

No automatic retries. No queue sets attempts or backoff, so the effective attempt count is 1. A transient failure (SMTP blip, provider timeout) drops the job to failed permanently; recovery is manual via Bull Board "retry" or a re-enqueue. Workers needing resilience implement their own loops (missOzone evaluation fetch) or re-enqueue follow-ups (resumeAnalysis ai-call-check).
Cleanup comment vs reality. scheduleSlotCleanup is documented as "every 24 hours" but actually registers every: 15 * 60 * 1000 (15 minutes). Trust the code.
leadAssignmentQueue has a producer + schedule but no consumer — the worker import is commented out in src/index.ts. Scheduled leadAutoAssignment jobs will pile up in waiting (inferred).
notificationQueue has a producer (addNotificationJob) and appears in Bull Board but has no worker on this branch (inferred).
Bull Board visibility gap. getQueues() returns 19 of 22 queues; warningQueue, badgeEvaluationQueue, and studentRiskComputationQueue are not shown in the dashboard. Their workers still run.
Idempotency relies on stable jobIds. Per-config/global schedules (mrlearn-sync-<id>, badge-evaluation-daily, student-risk-daily, mrlearn-new-student-sync, workflow-scan) replace rather than duplicate on re-registration. One-shot manual jobs do not dedupe — repeated manual triggers enqueue repeated jobs.
Each worker opens its own Redis connection (inline host/port), separate from the cache RedisService. There is no shared connection pool; a Redis outage affects producers and all workers alike.
Workers initialise the DB lazily. Most workers call DatabaseService.getInstance().initialize() and guard the handler with an if (!service) throw until the connection is ready. A job that arrives in that startup window throws and fails.
Hardening note (internal). A security/hardening observation for this area is tracked in the team's private notes (internal/security-and-hardening-notes.md) and is intentionally not published on this site.
Timezone. Cron schedules pin tz: 'Asia/Kolkata'; every-based schedules are timezone-agnostic wall-clock intervals from the moment of registration (i.e. effectively from each boot).
Graceful shutdown (SIGINT/SIGTERM in src/index.ts) closes Socket.IO, the HTTP server, the DataSource, and the cache Redis client — it does not explicitly close() the BullMQ workers, so in-flight jobs may be interrupted rather than drained (inferred).