Skip to content

Background Jobs & Queues (Redis + BullMQ)

All work in the Mr. Mentor backend that must not block an HTTP request — sending email, AI resume screening, recomputing dashboard KPIs, syncing external LMS/exam platforms, driving the sales-automation workflow engine, reconciling AI phone calls — is pushed onto a BullMQ queue backed by a single Redis instance and consumed by one of ~21 long-lived Workers started at boot. This document is the canonical map of every queue, every worker, the producer→queue→worker→side-effect flow, the scheduling/cron timeline, the job lifecycle, and the Bull Board monitoring UI.

Status: documented from source on this branch.


Overview

  • What it does. Offloads slow, retryable, or scheduled work from the request path onto Redis-backed job queues. Producers (controllers, services, socket handlers, other workers) enqueue jobs; workers process them asynchronously with their own concurrency and rate limits.
  • Who uses it. Indirectly everyone — every product surface in the suite relies on at least one queue (Mr. Mentor meetings → email + cleanup, Mr. Hire → resumeAnalysis, LMS → daily-cards/badges/warnings/risk, Sales CRM → workflow/leadAssignment/whatsapp/aaryaSync, Finance → email invoices). Operators (ADMIN / SUPERADMIN) observe and replay jobs through Bull Board.
  • Where it sits. A cross-cutting infrastructure layer. The two singletons are src/services/QueueService.ts (the producer side — owns all Queue objects and add*/schedule* helpers) and src/config/redis.ts (RedisService, a general-purpose Redis client also used for caching). Workers live in src/workers/* and are dynamically imported during the boot sequence in src/index.ts.

Key facts to keep in mind throughout:

  • One Redis, one logical DB. Every queue and the cache share REDIS_HOST / REDIS_PORT (defaults localhost:6379). Password / DB-index lines are commented out in both QueueService.ts and redis.ts, so auth is currently off.
  • No BullMQ-level retries are configured. No queue sets attempts or backoff in its defaultJobOptions, so the effective attempts is 1 — a thrown error moves the job straight to failed (see Edge cases). Some workers implement their own in-handler retry loops.
  • Repeatable jobs are re-registered on every boot with stable jobIds, so a config change or a Redis flush is always reconciled at startup (src/index.ts).

Key concepts & entities

This domain owns infrastructure, not TypeORM entities — its "data" is Redis keys (BullMQ job hashes, wait/active/completed/failed lists, repeatable-job schedules). It does, however, read and write many business entities through the services each worker calls.

Concept Meaning
Queue A named BullMQ list in Redis (e.g. emailQueue). Created in QueueService's constructor.
Job A unit of work added with queue.add(name, data, options). Carries a name, a data payload, and options.
Worker A long-lived consumer bound to one queue name (new Worker('emailQueue', processor, opts)). One file per worker in src/workers/.
Repeatable job A job with repeat: { every } or repeat: { pattern } (cron). BullMQ re-enqueues it on schedule. Registered via schedule*() helpers.
Stable jobId A fixed jobId (e.g. badge-evaluation-daily, mrlearn-sync-<configId>) so re-registering replaces the prior schedule instead of stacking duplicates.
defaultJobOptions Per-queue defaults — here only removeOnComplete / removeOnFail (how many finished jobs to retain for inspection).
concurrency How many jobs a worker processes in parallel. Most are 1; high-throughput queues use 3-5.
limiter Rate cap. Only emailQueue sets one (max: 10 per 1000 ms).

Relevant source files:

  • src/services/QueueService.ts — singleton producer + all queue definitions + all schedulers (1144 lines).
  • src/config/redis.tsRedisService singleton (ioredis), lazyConnect, reconnect-on-READONLY.
  • src/workers/*.worker.ts — 21 worker files.
  • src/routes/bullBoard.routes.ts — Bull Board dashboard, basic-auth protected.
  • src/types/EmailQueue.types.ts, src/types/DatabaseQueue.types.ts, src/types/ResumeQueue.types.ts — job payload contracts.

Architecture

flowchart TD
  subgraph PROD["Producers"]
    CTRL["Controllers / Services / Socket handlers"]
    SCHED["Boot schedulers (src/index.ts)"]
    WRK2WRK["Worker-to-worker enqueues"]
  end

  QS["QueueService singleton (src/services/QueueService.ts)"]
  REDIS[("Redis (BullMQ + cache)")]

  subgraph WORKERS["Workers (src/workers)"]
    EMAIL["email.worker (x5)"]
    DB["database.worker (x3)"]
    RESUME["resumeAnalysis.worker (x5)"]
    WF["workflow.worker (x5)"]
    OTHERS["cleanup / kpi / warning / daily-cards / badges / risk / sync workers (x1)"]
  end

  subgraph SIDE["Side effects"]
    SMTP["Nodemailer / Gmail SMTP"]
    PG[("PostgreSQL via TypeORM")]
    S3["AWS S3"]
    EXT["ElevenLabs / LiveKit / Graphy / EzExam / WhatsApp / Exotel"]
    CACHE["Redis cache keys (KPIs, stats)"]
  end

  CTRL -->|"addEmailJob / addResumeAnalysisJob / addWorkflowStepJob"| QS
  SCHED -->|"schedule* repeatable jobs"| QS
  WRK2WRK --> QS
  QS -->|"queue.add"| REDIS
  REDIS -->|"reserve job"| EMAIL
  REDIS --> DB
  REDIS --> RESUME
  REDIS --> WF
  REDIS --> OTHERS

  EMAIL --> SMTP
  DB --> PG
  RESUME --> PG
  RESUME --> S3
  RESUME --> EXT
  WF --> PG
  WF --> EXT
  OTHERS --> PG
  OTHERS --> CACHE
  OTHERS --> EXT

Each worker opens its own Redis connection inline (it does not reuse RedisService). The producer side is a single QueueService instance obtained via QueueService.getInstance().


Data model

There are no SQL tables for this domain. The "schema" is the set of Redis structures BullMQ maintains per queue, plus the job payload shapes declared in src/types/*Queue.types.ts.

erDiagram
  QUEUE ||--o{ JOB : "holds"
  QUEUE ||--o{ REPEATABLE : "schedules"
  JOB ||--|| PAYLOAD : "carries"
  WORKER }o--|| QUEUE : "consumes"

  QUEUE {
    string name PK
    int removeOnComplete
    int removeOnFail
  }
  JOB {
    string id PK
    string name
    string state
    int attemptsMade
    json data
  }
  REPEATABLE {
    string jobId PK
    string everyOrPattern
    string tz
  }
  WORKER {
    string queueName PK
    int concurrency
    json limiter
  }
  PAYLOAD {
    string type
    json fields
  }

Representative payload contracts:

  • Email (EmailJobData, src/types/EmailQueue.types.ts): a discriminated union over type (meeting-approval, payment-invoice, crm-lead-email, workflow-email, welcome-email, ~30 variants).
  • Database (DatabaseJobData, src/types/DatabaseQueue.types.ts): create-token, update-mentor-rating, updateMentorMultiplier, deleteUser, backfill-applications.
  • Resume (ResumeAnalysisJobData, src/types/ResumeQueue.types.ts): type of resume-analysis / full-screening / ai-call plus applicationId, resumeUrl/resumeKey, jobPostId, authToken.

API surface

The only HTTP surface for this domain is the Bull Board dashboard. It is constructed in src/routes/bullBoard.routes.ts and mounted in src/routes/index.ts (this.router.use('/', this.bullBoardRoutes.router)), which is itself mounted at / by src/app.ts. Inside BullBoardRoutes the express adapter base path is /admin/queues and the router guards it with HTTP Basic auth.

Method Path Auth/role Purpose
GET /admin/queues HTTP Basic (BULL_BOARD_USERNAME / BULL_BOARD_PASSWORD (credentials set via env vars)) Bull Board UI — list queues, inspect jobs
GET/POST /admin/queues/api/* HTTP Basic Bull Board internal API (job lists, retry, promote, clean) used by the UI

Notes:

  • Bull Board registers 19 of the 22 queues (see QueueService.getQueues()). warningQueue, badgeEvaluationQueue, and studentRiskComputationQueue are not in the Bull Board list and so are not visible in the dashboard, even though their workers run.
  • This is not under /api and does not use the JWT authMiddleware; it has its own Basic-auth gate.
  • There is no JSON REST API to enqueue jobs directly — production happens in-process via QueueService methods called from controllers/services. Several services also expose admin endpoints that internally call triggerMrLearnSyncNow, addWorkflowScanJob, addAaryaSyncJob, etc. (those endpoints belong to their feature docs).

Workers & queues — the master table

Concurrency / retention values are read from each worker file and QueueService defaultJobOptions. "Trigger/Schedule" lists how jobs reach the queue.

Worker (src/workers/) Queue Trigger / Schedule Purpose
email.worker (conc 5, limiter 10/sec) emailQueue On demand via addEmailJob Transactional email for ~30 types — meeting approve/confirm/expire, onboarding (mentor/admin/sales/student/HR/CM/super-mentor), password-changed, role-changed, payment invoice, quiz invite, discount approved, batch-meeting, CRM lead email, telecaller assignment. Sends via Nodemailer/Gmail.
whatsapp.worker (conc 5) whatsappQueue On demand via addWhatsAppJob WhatsApp sends. Currently handles APPLICATION_MEETING_REMINDER through ApplicationMeetingReminderService.
database.worker (conc 3) databaseQueue On demand + cron 0 2 * * * IST Heavy DB ops off the request path: create-token, update-mentor-rating, updateMentorMultiplier, deleteUser, and the scheduled backfill-applications job.
cleanup.worker (conc 1) cleanupQueue Repeatable every 15m (scheduleSlotCleanup) Expire stale/unbooked slots via SlotCleanupService. (Doc comment says 24h; code is 15 min.)
kpi.worker (conc 1) kpiQueue Repeatable every 15m (two jobs) Recompute + cache dashboard KPIs (dashboardKpiCalculation) and sales overview stats (salesOverviewStatsCalculation) into Redis via AdminDashboardService.
resumeAnalysis.worker (conc 5) resumeAnalysisQueue On demand via addResumeAnalysisJob Mr. Hire screening pipeline: resume-analysis / full-screening (LLM scoring), ai-call (dispatch AI phone interview), ai-call-check (poll result). Re-enqueues follow-up jobs onto itself.
salaryBenchmark.worker (conc 1) salaryBenchmarkQueue Repeatable every 15 days Refresh salary benchmark data via SalaryBenchmarkService.
warning.worker (conc 1) warningQueue Cron 59 23 * * * IST + on demand Process the day's attendance absences and create student warnings.
leadAssignment.worker (conc 1) leadAssignmentQueue Repeatable every 15m scheduled — but worker import is commented out in index.ts Round-robin auto-assign unassigned CRM leads via SalesDashboardService. Schedule is registered yet no consumer runs, so jobs accumulate (inferred).
dailyCards (daily-cards.worker, conc 1) dailyCardsQueue Cron 0 0 * * * IST Generate per-student daily engagement/gamification cards (Redis-backed).
course-plan.worker (conc 1) coursePlanQueue On demand Generate course-plan / lecture content "one lecture at a time".
missOzone.worker (conc 4) missOzoneQueue Repeatable every 60s (poll-active-calls) + cron 30 3 * * * (prune-old-calls) + on demand AI voice-mentor (Miss Ozone) call reconciliation against LiveKit, evaluation fetch with in-handler retries, and daily retention prune of transcript/eval blobs.
mrlearnSync.worker (conc 1) mrlearnSyncQueue Per-config repeatable every intervalHours + manual Sync learners/progress from Graphy (Mr. Learn) per MrLearnSyncConfig. Supports fast incremental mode.
mrlearnReminder.worker (conc 1) mrlearnReminderQueue Per-config repeatable every reminderIntervalHours + manual Send WhatsApp course-progress nudges to learners below a progress threshold.
mrlearnNewStudentSync.worker (conc 1) mrlearnNewStudentSyncQueue Global repeatable, SystemConfig-driven interval Fan out a fast new-enrolment sync across all enabled MrLearnSyncConfigs.
mrtestSync.worker (conc 1) mrtestSyncQueue Per-config repeatable every intervalHours + manual Sync exam/test data from EzExam (Mr. Test) per MrTestSyncConfig.
badgeEvaluation.worker (conc 1) badgeEvaluationQueue Repeatable every 24h Safety-net cron re-running BadgeService.evaluateForUser for enrolled students (catches badges like profile_pro whose events skip grantXp).
studentRiskComputation.worker (conc 1) studentRiskComputationQueue Repeatable every 24h Walk active batches and write fresh student_risk_scores rows via StudentRiskService (BL "Students at Risk" card).
assignmentReminder.worker (conc 1) assignmentReminderQueue Cron 0 19 * * * IST Notify students of assignments due in the next 24h (assignment_due_24h template, subject to a 3/day cap).
aaryaSync.worker (conc 1) aaryaSyncQueue Repeatable every 15m + on demand Poll ElevenLabs for dispatched Aarya calls; write back duration + conversation_id + predicted lead interest (AaryaCallSyncService).
workflow.worker (conc 5) workflowQueue Repeatable scan every 5m + event/step/wake enqueues Sales-automation engine: workflowScan (enroll matching leads), workflowStep (advance one enrollment by one node, incl. delayed Wait steps), workflowEvent (react to lead lifecycle events).
(no dedicated worker) notificationQueue addNotificationJob Declared and exposed in Bull Board, but no notification.worker consumes it on this branch (inferred — notifications are largely handled inline / via email + whatsapp queues).

Producer→queue→worker→side-effects, for the busiest queues:

flowchart LR
  subgraph P["Producers"]
    MC["Meeting / Auth / Payment controllers"]
    HC["Mr. Hire job-application flow"]
    SC["Sales CRM lead events"]
    CR["Boot cron schedulers"]
  end

  MC -->|"addEmailJob"| EQ["emailQueue"]
  MC -->|"addDatabaseJob"| DQ["databaseQueue"]
  HC -->|"addResumeAnalysisJob"| RQ["resumeAnalysisQueue"]
  SC -->|"addWorkflowEventJob / addWorkflowStepJob"| WQ["workflowQueue"]
  CR -->|"every 15m"| KQ["kpiQueue"]
  CR -->|"every 15m"| CQ["cleanupQueue"]

  EQ --> EW["email.worker"] --> SMTP["Gmail SMTP"]
  DQ --> DW["database.worker"] --> PG[("PostgreSQL")]
  RQ --> RW["resumeAnalysis.worker"] --> LLM["LLM + AI call + S3"]
  WQ --> WW["workflow.worker"] --> WFX["lead state + Aarya calls + emails"]
  KQ --> KW["kpi.worker"] --> RC["Redis cache"]
  CQ --> CW["cleanup.worker"] --> PG

User journeys

1. Transactional email (e.g. meeting confirmation)

A controller never sends mail synchronously — it enqueues a job and returns immediately. The email.worker (concurrency 5, rate-limited to 10/sec) drains the queue.

sequenceDiagram
  participant FE as Frontend
  participant API as Meeting controller
  participant QS as QueueService
  participant R as Redis
  participant EW as email.worker
  participant SMTP as Gmail SMTP

  FE->>API: confirm meeting
  API->>QS: addEmailJob with type meeting-confirmation
  QS->>R: enqueue sendEmail job on emailQueue
  API-->>FE: 200 OK queued
  R->>EW: reserve job when a slot is free
  EW->>EW: switch on data.type then build template
  EW->>SMTP: send mail
  alt send succeeds
    SMTP-->>EW: accepted
    EW->>R: mark completed and keep last 50
  else send throws
    SMTP-->>EW: error
    EW->>R: mark failed and keep last 50
    Note over EW,R: no attempts configured so no auto-retry
  end

2. Mr. Hire resume screening (multi-stage, self-chaining)

The screening pipeline is the most complex flow. A single application can pass through resume-analysis, full-screening, ai-call, and ai-call-check — and the worker enqueues its own follow-up jobs rather than blocking on a long AI phone call.

sequenceDiagram
  participant API as Job-application API
  participant QS as QueueService
  participant R as Redis
  participant RW as resumeAnalysis.worker
  participant S3 as AWS S3
  participant AI as LLM and AI-call provider
  participant PG as PostgreSQL

  API->>QS: addResumeAnalysisJob type full-screening
  QS->>R: enqueue analyzeResume job
  R->>RW: reserve job
  RW->>S3: fetch resume by key
  RW->>AI: score resume against job post
  RW->>PG: write ScreeningResult and tier
  alt rules say make AI call
    RW->>AI: dispatch AI phone interview
    RW->>QS: enqueue follow-up ai-call-check job
    Note over RW: worker released instead of blocking 6 plus minutes
    R->>RW: reserve ai-call-check later
    RW->>AI: poll call result
    alt result ready
      RW->>PG: store communication score and transcript
    else still running
      RW->>QS: re-enqueue ai-call-check with incremented count
    end
  else no call needed
    RW->>PG: finalize composite score
  end

3. Scheduled KPI recompute (repeatable every 15 minutes)

sequenceDiagram
  participant Boot as src index startServer
  participant QS as QueueService
  participant R as Redis
  participant KW as kpi.worker
  participant ADS as AdminDashboardService
  participant PG as PostgreSQL
  participant Cache as Redis cache

  Boot->>QS: scheduleKpiCalculation
  QS->>R: remove old repeatable then add two repeatable jobs every 15m
  loop every 15 minutes
    R->>KW: deliver dashboardKpiCalculation job
    KW->>ADS: compute dashboard KPIs
    ADS->>PG: aggregate queries
    KW->>Cache: write cached KPI payload
    R->>KW: deliver salesOverviewStatsCalculation job
    KW->>Cache: write cached sales overview
  end

4. Sales-automation workflow engine (scan + step + event)

The workflowQueue carries three job names. The 5-minute workflowScan enrolls newly-matching leads; each enrollment then advances one node per workflowStep job, with delayed jobs used for Wait nodes; lead lifecycle changes publish workflowEvent jobs for event-driven workflows.

sequenceDiagram
  participant Boot as Boot scheduler
  participant CRM as CRM lead change
  participant QS as QueueService
  participant R as Redis
  participant WW as workflow.worker
  participant ENG as WorkflowEngineService
  participant PG as PostgreSQL

  Boot->>QS: scheduleWorkflowScan every 5m
  CRM->>QS: addWorkflowEventJob with leadIds
  QS->>R: enqueue workflowScan and workflowEvent jobs
  R->>WW: reserve workflowScan
  WW->>ENG: runTriggerScan
  ENG->>PG: find matching leads and create enrollments
  ENG->>QS: addWorkflowStepJob per new enrollment
  R->>WW: reserve workflowStep
  WW->>ENG: processEnrollmentStep
  alt node is a Wait step
    ENG->>QS: re-enqueue workflowStep with delay
  else node is an action
    ENG->>PG: apply action and advance pointer
  end

5. Aarya / Miss Ozone AI-call reconciliation (poll-back)

AI phone calls are dispatched elsewhere and reconciled by repeatable pollers, because the provider result is not available synchronously.

sequenceDiagram
  participant Boot as Boot scheduler
  participant QS as QueueService
  participant R as Redis
  participant AW as aaryaSync.worker
  participant EL as ElevenLabs API
  participant PG as PostgreSQL

  Boot->>QS: scheduleAaryaSync every 15m
  QS->>R: add repeatable aaryaCallSync job
  loop every 15 minutes
    R->>AW: deliver aaryaCallSync job
    AW->>EL: list dispatched-but-unfinished conversations
    alt conversation finished
      EL-->>AW: duration and conversation id and transcript
      AW->>PG: update LeadCallLog and RawLead predicted interest
    else still in progress
      Note over AW: leave for next tick
    end
  end

missOzone.worker follows the same pattern at a tighter cadence — poll-active-calls every 60 s reconciles calls stuck in scheduled/dialing/completed against LiveKit, with an in-handler retry loop for evaluation fetches, and a daily prune-old-calls at 03:30 that NULLs heavy transcript/eval columns and deletes long-failed rows.

6. External LMS / exam sync (per-config repeatable)

sequenceDiagram
  participant Admin as Admin panel
  participant Svc as MrLearn config service
  participant QS as QueueService
  participant R as Redis
  participant SW as mrlearnSync.worker
  participant GR as Graphy API
  participant PG as PostgreSQL

  Admin->>Svc: enable sync config with intervalHours
  Svc->>QS: scheduleMrLearnSync configId intervalHours
  QS->>R: add repeatable job with stable id mrlearn-sync configId
  loop every intervalHours
    R->>SW: deliver mrlearnSync job
    SW->>GR: pull learners and progress
    SW->>PG: upsert learner rows
  end
  Admin->>Svc: trigger manual sync now
  Svc->>QS: triggerMrLearnSyncNow with fast flag
  QS->>R: enqueue one-shot job
  R->>SW: deliver and process immediately

On boot, src/index.ts re-reads MrLearnSyncConfig (enabled / reminderEnabled), MrTestSyncConfig (enabled), and SystemConfig keys (MRLEARN_NEW_STUDENT_SYNC_ENABLED / ..._INTERVAL_HOURS) from Postgres and recreates every Redis-side schedule, so DB config is always the source of truth even after a Redis flush.


Background jobs & async

Scheduling / cron timeline

All cron patterns use tz: 'Asia/Kolkata' (IST) unless noted; every intervals are wall-clock from registration.

Cadence Job(s) Queue Registered by
every 60 s poll-active-calls missOzoneQueue scheduleMissOzonePolling
every 5 min workflowScan workflowQueue scheduleWorkflowScan
every 15 min slotCleanup cleanupQueue scheduleSlotCleanup
every 15 min dashboardKpiCalculation + salesOverviewStatsCalculation kpiQueue scheduleKpiCalculation
every 15 min aaryaCallSync aaryaSyncQueue scheduleAaryaSync
every 15 min leadAutoAssignment (consumer disabled) leadAssignmentQueue scheduleLeadAutoAssignment
19:00 IST (0 19 * * *) remindAssignmentsDue assignmentReminderQueue scheduleAssignmentReminders
23:59 IST (59 23 * * *) dailyWarningProcessing warningQueue scheduleDailyWarningProcessing
02:00 IST (0 2 * * *) backfillApplications databaseQueue scheduleApplicationBackfill
03:30 (30 3 * * *) prune-old-calls missOzoneQueue scheduleMissOzonePrune
00:00 IST (0 0 * * *) generateDailyCards dailyCardsQueue scheduleDailyCards
every 24 h badgeEvaluation badgeEvaluationQueue scheduleBadgeEvaluation(24)
every 24 h studentRiskComputation studentRiskComputationQueue scheduleStudentRiskComputation(24)
every 15 days salaryBenchmarkRefresh salaryBenchmarkQueue scheduleSalaryBenchmarkRefresh
per-config every intervalHours mrlearnSync / mrtestSync mrlearnSyncQueue / mrtestSyncQueue scheduleMrLearnSync / scheduleMrTestSync
per-config every reminderIntervalHours mrlearnReminder mrlearnReminderQueue scheduleMrLearnReminder
SystemConfig-driven every Nh mrlearnNewStudentSync mrlearnNewStudentSyncQueue scheduleMrLearnNewStudentSync
flowchart LR
  T60["60s: missOzone poll"]
  T5["5m: workflow scan"]
  T15["15m: cleanup, kpi x2, aaryaSync, leadAssign"]
  D1["daily: dailyCards 00:00, backfill 02:00, prune 03:30, assignReminder 19:00, warning 23:59"]
  H24["24h: badges, student risk"]
  D15["15d: salary benchmark"]
  CFG["per-config: mrlearn/mrtest sync, mrlearn reminder, new-student sync"]

Two scheduling styles coexist: repeat: { every: ms } (fixed interval from registration) and repeat: { pattern: cron, tz } (wall-clock cron). Per-config and per-flag schedules use a stable jobId (e.g. mrlearn-sync-<configId>) so re-registration replaces rather than duplicates.

Worker concurrency & retention summary

Property Value(s) Where
Concurrency 5 email, whatsapp, resumeAnalysis, workflow each worker opts.concurrency
Concurrency 4 missOzone missOzone.worker
Concurrency 3 database database.worker
Concurrency 1 all remaining workers their worker files
Rate limiter emailQueue only — max 10 per 1000 ms email.worker
Retention removeOnComplete / removeOnFail ranging 10-200 per queue QueueService defaultJobOptions
Retries / backoff none configured (attempts defaults to 1) absent from all defaultJobOptions

Socket events & webhooks

This domain has no Socket.IO events or inbound webhooks of its own. (Socket.IO lives in src/socket.ts for meetings; provider callbacks for Exotel/Razorpay/Leegality are HTTP routes owned by their feature docs.) AI-call results are obtained by polling workers (aaryaSync, missOzone, resumeAnalysis ai-call-check), not by webhook.


External integrations

System Used by worker(s) Env / config Failure / fallback
Redis all queues + workers REDIS_HOST, REDIS_PORT (defaults localhost:6379); password/db commented out RedisService uses lazyConnect + reconnect-on-READONLY; if Redis is down at boot, initializeRedis() throws and the process exits.
Gmail SMTP (Nodemailer) email.worker EMAIL_USER, EMAIL_PASS Auth failures (535) throw → job marked failed (no auto-retry); see Memory note on rotating the app password and replaying stuck jobs.
ElevenLabs (Aarya) aaryaSync.worker, resumeAnalysis.worker (ai-call) provider keys (feature env) Single-flight concurrency: 1 on aaryaSync to avoid hammering; unfinished conversations are simply re-polled next tick.
LiveKit (Miss Ozone) missOzone.worker LiveKit base/API env In-handler retry loop for evaluation fetch (gives up after N attempts and logs); stuck calls reconciled every 60 s.
Graphy / EzExam mrlearnSync, mrlearnReminder, mrlearnNewStudentSync, mrtestSync per-config rows in Postgres (MrLearnSyncConfig, MrTestSyncConfig) + SystemConfig flags Per-config schedules re-derived from DB on boot; a failing sync fails its job only.
WhatsApp whatsapp.worker, mrlearnReminder.worker WhatsApp provider env Unknown job type throws (job fails).
AWS S3 resumeAnalysis.worker AWS_* env Used to read resume objects; fetch errors fail the job.
Bull Board UI dashboard BULL_BOARD_USERNAME, BULL_BOARD_PASSWORD ( (credentials set via env vars)) Basic-auth only — change credentials in any non-trivial environment.

Feature flags / conditional schedules:

  • MRLEARN_NEW_STUDENT_SYNC_ENABLED (SystemConfig) toggles the global new-student sync; when false the boot code calls unscheduleMrLearnNewStudentSync().
  • MrLearnSyncConfig.enabled / .reminderEnabled and MrTestSyncConfig.enabled gate the per-config schedules.
  • leadAssignment.worker import is commented out in src/index.ts — the schedule exists but is unconsumed (inferred dead path).

Status lifecycles

BullMQ job lifecycle

Because no queue configures attempts/backoff, the retry branch below is available in BullMQ but not exercised by this codebase's default options — a failed handler goes straight to failed and is retained per removeOnFail. (Workers that "retry" do so with their own in-handler loops or by re-enqueueing follow-up jobs.)

stateDiagram-v2
  [*] --> waiting: queue.add
  waiting --> delayed: job has delay or repeat
  delayed --> waiting: delay elapses
  waiting --> active: worker reserves job
  active --> completed: handler resolves
  active --> failed: handler throws
  failed --> waiting: only if attempts remain
  completed --> [*]: trimmed by removeOnComplete
  failed --> [*]: trimmed by removeOnFail

Repeatable-job registration lifecycle

Every schedule*() helper follows the same idempotent pattern so boots never stack duplicate schedules.

stateDiagram-v2
  [*] --> Scanning: schedule called on boot
  Scanning --> Removing: getRepeatableJobs finds prior key
  Removing --> Adding: removeRepeatableByKey
  Scanning --> Adding: no prior key
  Adding --> Scheduled: add with repeat and stable jobId
  Scheduled --> [*]
  Scheduled --> Removing: unschedule called for disabled config

Edge cases, limits & gotchas

  • No automatic retries. No queue sets attempts or backoff, so the effective attempt count is 1. A transient failure (SMTP blip, provider timeout) drops the job to failed permanently; recovery is manual via Bull Board "retry" or a re-enqueue. Workers needing resilience implement their own loops (missOzone evaluation fetch) or re-enqueue follow-ups (resumeAnalysis ai-call-check).
  • Cleanup comment vs reality. scheduleSlotCleanup is documented as "every 24 hours" but actually registers every: 15 * 60 * 1000 (15 minutes). Trust the code.
  • leadAssignmentQueue has a producer + schedule but no consumer — the worker import is commented out in src/index.ts. Scheduled leadAutoAssignment jobs will pile up in waiting (inferred).
  • notificationQueue has a producer (addNotificationJob) and appears in Bull Board but has no worker on this branch (inferred).
  • Bull Board visibility gap. getQueues() returns 19 of 22 queues; warningQueue, badgeEvaluationQueue, and studentRiskComputationQueue are not shown in the dashboard. Their workers still run.
  • Idempotency relies on stable jobIds. Per-config/global schedules (mrlearn-sync-<id>, badge-evaluation-daily, student-risk-daily, mrlearn-new-student-sync, workflow-scan) replace rather than duplicate on re-registration. One-shot manual jobs do not dedupe — repeated manual triggers enqueue repeated jobs.
  • Each worker opens its own Redis connection (inline host/port), separate from the cache RedisService. There is no shared connection pool; a Redis outage affects producers and all workers alike.
  • Workers initialise the DB lazily. Most workers call DatabaseService.getInstance().initialize() and guard the handler with an if (!service) throw until the connection is ready. A job that arrives in that startup window throws and fails.
  • Hardening note (internal). A security/hardening observation for this area is tracked in the team's private notes (internal/security-and-hardening-notes.md) and is intentionally not published on this site.
  • Timezone. Cron schedules pin tz: 'Asia/Kolkata'; every-based schedules are timezone-agnostic wall-clock intervals from the moment of registration (i.e. effectively from each boot).
  • Graceful shutdown (SIGINT/SIGTERM in src/index.ts) closes Socket.IO, the HTTP server, the DataSource, and the cache Redis client — it does not explicitly close() the BullMQ workers, so in-flight jobs may be interrupted rather than drained (inferred).