Onboarding Text Verification Stabilization
Summary
- The current onboarding flow is a code-claim flow, not a standard OTP flow:
POST /api/phone/verify/initiatecreates aCONSUL-XXXXXXcode inphone_verification_codeswith a 15-minute TTL, the app opens a text deeplink, inbound messaging callsPOST /api/phone/verify/claim, and a successful claim setsprofiles.phone_number,profiles.phone_verified = true, and advances onboarding. - Production impact is real and measurable:
public.profilescurrently has 22 users withonboarding_completed = falseorphone_verified = false, 7 users inawaiting_phone_verification, and 6 expired unconsumed onboarding codes. Chloe is in that cohort withphone_verified = false,onboarding_state.status = awaiting_phone_verification, no phone number, and an expired code. - Confirmed / strongly supported failure points:
- The onboarding API uses an env-backed number, but other UI surfaces still hard-code a different phone number, so env fixes can still leave stale user-visible numbers.
- Expired and invalid code claims are collapsed into the same generic error.
- Unknown or malformed inbound verification texts do not get an instructional reply and can fall through prospect routing.
- The current
sms:${number}&body=${code}deeplink is fragile and should not be relied on for body prefill.
- Execute in this order: config hardening, verification lifecycle fixes, inbound fallback handling, stuck-user recovery, then SMS OTP evaluation.
Implementation Changes
1. Canonicalize the Consul phone number
- Add a single shared server-side helper for the verification number and use it everywhere user-facing.
- Make
CONSUL_VERIFY_NUMBERthe canonical env var for onboarding. KeepPHOTON_NUMBERandCONSUL_IMESSAGE_NUMBERas temporary aliases for one release, but log an error if multiple values are set and differ. - Remove hard-coded number usage from authenticated UI surfaces and route all displayed/copied numbers through the canonical helper or server-provided data.
- During execution, verify the live Vercel project
generative-inc/consul-agenthas the same number in production, preview, and development. Treat the local.vercel/project.jsonas stale for this audit.
2. Fix the onboarding verification lifecycle
- Keep the current 15-minute TTL for the hotfix.
- Update
POST /api/phone/verify/initiateso that every onboarding restart:- expires any prior unconsumed onboarding codes for the current user,
- creates one fresh code,
- sets onboarding state to
step = 2,status = awaiting_phone_verification, - returns
{ code, consulNumber, deepLink, expiresAt }.
- Change the deeplink strategy to recipient-only launch. Do not depend on body prefill. The UI should always display the exact
CONSUL-XXXXXXcode and the destination number with explicit copy/manual-send instructions. - Update
GET /api/phone/verify/statusto return a structured state for the current user:idle,pending,expired, orverified, plusphoneNumberandexpiresAtwhen relevant. - Update
POST /api/phone/verify/claimto return structured failure reasons instead of only"Invalid or expired verification code". Use at least:expired,invalid,already_consumed,phone_already_bound, andverified.
3. Add user-facing fallback handling
- In the onboarding step, replace the current generic timeout UX with explicit expired-state handling. When polling sees
expired, stop polling, show “Code expired after 15 minutes,” and expose aGet new codeaction that callsinitiateagain. - Keep step 2 as the phone gate. On successful verification, continue advancing to step 3 exactly as today.
- Update product copy to match the actual format. The app should tell users to send the exact
CONSUL-XXXXXXcode, not a bare 6-digit code. - Improve the step 2 fallback UI so it works even if the deeplink does nothing: visible number, visible code, copy actions, retry/regenerate action, and no reliance on the messaging app opening successfully.
4. Handle malformed and stale inbound texts
- In the messaging resolver, keep claim attempt first for code-like messages from unknown numbers.
- If claim fails with
expired, send an instructional reply instead of falling through: tell the user to return to the app and generate a new code. - If the message looks like a verification attempt but is malformed or invalid, send an instructional reply telling the user to send the exact
CONSUL-XXXXXXcode from the app. - Preserve existing prospect routing only for non-onboarding unknown-number conversations. Do not send prospect replies for short code-like or verification-like messages.
5. Recover the currently stuck users
- Use
public.profilesas the source of truth, not an app-leveluserstable. - Add an admin recovery script or runbook that can:
- list stuck users with email, name, onboarding state, phone verification state, and latest code status,
- expire any unconsumed onboarding codes for selected users,
- reset selected users to
step = 2,status = awaiting_phone_verification, - leave
phone_verified = false, - leave any existing
phone_numberuntouched for this first recovery pass.
- Run that recovery path for the 7 users currently in
awaiting_phone_verification, including Chloe, after the hotfix is deployed. - Treat the separate issue of unverified numbers being written outside the verification flow as follow-up hardening, not the blocker for the immediate fix.
6. Evaluate SMS OTP as the long-term replacement
- Mark this as a product decision, not part of the hotfix branch.
- Produce a short implementation proposal for a standard SMS OTP flow:
- user enters phone number,
- backend sends OTP through a provider,
- user enters OTP in-app,
- backend verifies and sets
profiles.phone_verified = true.
- Compare tradeoffs explicitly: per-SMS cost and provider dependency versus much higher reliability and no dependence on deeplinks or messaging-app behavior.
Test Plan
- API tests:
- onboarding
initiateexpires prior active codes and returns a freshexpiresAt, statustransitionsidle -> pending -> verifiedandpending -> expired,claimreturns each structured failure reason and updatesprofiles/onboarding on success.
- onboarding
- Messaging tests:
- successful claim from an unknown phone verifies and short-circuits routing,
- expired and malformed verification texts get instructional replies,
- ordinary unknown-number messages still follow prospect behavior.
- UI tests:
- step 2 shows code and number after initiation,
- expired status surfaces a regenerate action,
- successful verification advances onboarding from step 2 to step 3.
- Live smoke tests:
- one preview and one production end-to-end run with a real device and the real Console number,
- confirm inbound processing,
phone_verification_codes.consumed_at, andprofiles.phone_verified = true, - run the recovery script on one known-stuck user and verify the regenerated code path works.
Assumptions and Defaults
- Keep the current 15-minute onboarding TTL for now.
- Keep the current
CONSUL-XXXXXXcode format for the hotfix. - Use the live Vercel project
generative-inc/consul-agentas the environment source of truth; the repo-local Vercel link is not the live project for this flow. - Direct env confirmation in Vercel still requires authenticated access during execution.
[Product decision]Standard SMS OTP is a follow-up proposal, not part of the immediate unblock.