Add PDF View/Text View tab switcher design report and update plan

Documents the accessibility specialist review of the proposed tab
switcher feature: ARIA tablist/tab/tabpanel pattern requirements,
heuristic HTML parsing strategy for extracted text, signing form
UX recommendation (read-only text view + always-visible Vue panel),
and concrete pitfalls including DaisyUI incompatibility with APG
keyboard model, 15-page cap handling, and RTL text direction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pull/599/head
Marcelo Paiva 3 weeks ago
parent bd6e759203
commit 778a379086

@ -137,6 +137,29 @@
3. **Complete Task 7** (Phase 1 accessibility tests) 3. **Complete Task 7** (Phase 1 accessibility tests)
4. **Begin Phase 2**: Form error associations and ARIA live regions 4. **Begin Phase 2**: Form error associations and ARIA live regions
---
## Session Summary - 2026-02-25 (follow-up)
### Expert design review: PDF View / Text View tab switcher
Produced detailed design report at `.reports/pdf-text-view-tab-switcher-design.md` covering:
- ARIA tab pattern requirements (roles, keyboard behavior, roving tabindex)
- Text View content strategy: heuristic parsing (Approach B) recommended for MVP
- Signing form UX: read-only Text View + always-visible Vue form panel + sticky "return to sign" CTA
- Scoped implementation sequence (preview page first, then signing form)
- Key pitfalls: DaisyUI radio-tab incompatibility with ARIA APG, 15-page cap handling, `hidden` attribute requirement, RTL `dir="auto"`, text quality disclosure, localStorage state persistence
### Recommended next implementation steps
1. **Create `lib/pdf_text_to_html.rb` service** — heuristic parser converting `pages_text` metadata strings into structured HTML (`<article>`, `<section>`, `<h2>`, `<ol>`, `<ul>`, `<p dir="auto">`)
2. **Add ARIA tab switcher to `submissions/show.html.erb`** — preview page only, no signing complications
3. **Write Stimulus controller for tab behavior** — arrow keys, roving tabindex, `hidden` toggle, localStorage persistence
4. **Verify with VoiceOver + keyboard-only** before touching signing form
5. **Add tab switcher to `submit_form/show.html.erb`** — with sticky "return to sign" CTA inside text panel
6. **Handle 15-page cap**: hide tab entirely if `pages_text` key count < `number_of_pages`
### WCAG 2.2 Criteria Addressed ### WCAG 2.2 Criteria Addressed
**1.1.1 Non-text Content (Level A)** - All images now have alt text **1.1.1 Non-text Content (Level A)** - All images now have alt text

@ -0,0 +1,371 @@
# PDF View / Text View Tab Switcher: Accessibility Design Report
**Date**: 2026-02-25
**Branch**: extract-content-from-pdf
**Author**: Accessibility specialist review
**Question origin**: Product team request for "PDF View / Text View" tab switcher on document pages
---
## Prior Art in This Codebase
The previous expert opinion (`pdf-text-visibility-expert-opinion.md`) recommended keeping `sr-only` as the primary path and warned that exposing extracted text to sighted users creates a layout-fidelity trust problem in a legal document signing context. The current implementation stores extracted text in `attachment.metadata['pdf']['pages_text']` and renders it in `sr-only` divs after each page image.
This report evaluates the specific "tab switcher" UI pattern now requested and gives concrete implementation guidance.
---
## 1. The ARIA Tab Pattern: What Is Required
### Is it a well-established pattern?
Yes. The ARIA Authoring Practices Guide (APG) defines the Tab Panel widget as a first-class interactive widget with a documented keyboard interaction model. It is one of the most commonly used ARIA patterns. Examples: browser DevTools, VS Code settings panels, GOV.UK design system, and virtually every design system (DaisyUI, shadcn, Headless UI).
### Required ARIA structure
```html
<!-- Tab list container -->
<div role="tablist" aria-label="Document view options">
<!-- Individual tabs -->
<button role="tab"
id="tab-pdf"
aria-selected="true"
aria-controls="panel-pdf"
tabindex="0">
PDF View
</button>
<button role="tab"
id="tab-text"
aria-selected="false"
aria-controls="panel-text"
tabindex="-1">
Text View
</button>
</div>
<!-- Panel for PDF View (active) -->
<div role="tabpanel"
id="panel-pdf"
aria-labelledby="tab-pdf"
tabindex="0">
<!-- PDF page images + overlaid fields -->
</div>
<!-- Panel for Text View (hidden) -->
<div role="tabpanel"
id="panel-text"
aria-labelledby="tab-text"
tabindex="0"
hidden>
<!-- Structured HTML text -->
</div>
```
### Required keyboard behavior (WCAG 2.1.1 + APG)
| Key | Behavior |
|-----|----------|
| Tab | Moves focus INTO the tab list (to the active tab), then OUT OF the tab list to the first focusable element in the active panel |
| Arrow Left / Arrow Right | Moves focus between tabs within the tablist; DOES NOT change content — content changes happen on focus (automatic activation) OR on Enter/Space (manual activation) |
| Home | Focus first tab |
| End | Focus last tab |
| Enter / Space | Activates a tab if using manual activation model |
**Automatic vs. manual activation**: For two-tab switchers with fast content swaps (no async load), automatic activation (content switches as you arrow between tabs) is acceptable. If Text View requires any async work (API call, processing), use manual activation (Tab content only switches on Enter/Space) to prevent jarring focus/content shifts.
### tabindex roving pattern
Only the currently selected tab has `tabindex="0"`. All other tabs have `tabindex="-1"`. This ensures Tab key only hits the active tab, not every tab in sequence. Arrow keys cycle through all tabs.
### Panel tabindex
`tabindex="0"` on the panel div makes it focusable so pressing Tab from the active tab moves focus into the panel. This is required by APG. If the panel has focusable children (form fields, buttons), `tabindex` on the panel itself can be omitted — focus will land on the first focusable child.
---
## 2. Text View: What Should It Contain
### The honest answer about Pdfium plain text
Pdfium's `FPDFText_GetText` returns Unicode text in content-stream order. For standard contracts produced by Word, Google Docs, or document-generation libraries, content-stream order matches reading order and produces clean prose with `\r\n` line breaks. This is what DocuSeal's actual documents will mostly be.
For the purposes of a text accessibility view, the goal is not "identical representation of the PDF" — it is "readable, searchable, reflowable alternative for users who cannot access the image rendering." That is a different and more achievable bar.
### Evaluation of approaches
#### Approach A: `<pre>` or whitespace-preserved `<p>` tags
**What it delivers**: Raw text with whitespace preserved. No semantic structure. Line breaks from the raw string become visible in `<pre>` or require `white-space: pre-wrap` in `<p>`.
**Accessibility value**: Low. `<pre>` is technically for preformatted text (code, ASCII art). Screen readers announce `<pre>` as a "code block" in some modes. More importantly: a raw text dump with no structure has no heading hierarchy, no navigable sections, no landmark regions — AT users cannot skip to relevant sections.
**Recommendation**: Do not use `<pre>`. Use `<p>` with `white-space: pre-wrap` only as a last resort.
#### Approach B: Heuristic parsing
**What it delivers**: Lines matching `/^[A-Z][A-Z\s]{4,}$/` become `<h2>`, numbered lines become `<ol>`, lines starting with `•` or `-` become `<ul>`, everything else becomes `<p>`.
**Accuracy for legal documents**: Surprisingly good. Standard NDA/contract PDFs produced by Word/Google Docs have all-caps or title-case section headings. Numbered clauses are by definition numbered lines. The heuristic does not need to be perfect — it needs to be better than a wall of unseparated text.
**False positive risk**: A sentence beginning with "1." that is NOT a list item gets wrapped in `<ol>`. This is a minor misrepresentation. In a legal context, the risk is acceptable when the text view is clearly labeled as an accessibility alternative, not a legal copy.
**Recommended rules** (conservative, order matters):
```ruby
# 1. Split on \r\n or \n
# 2. Skip blank lines (preserve as paragraph breaks)
# 3. ALL_CAPS line 3+ chars with no sentence punctuation → <h2>
# 4. Line matching /^\d+\.\s/ → accumulate as <ol><li>
# 5. Line matching /^[•\-\*]\s/ → accumulate as <ul><li>
# 6. Remaining non-blank lines → <p>
```
**Recommendation**: Use this for an MVP. It is sufficient for the 90% case (Word-generated legal documents). It requires ~50 lines of Ruby in a service object.
#### Approach C: LLM conversion (Claude API)
**What it delivers**: Accurate semantic structure, correct heading levels, properly identified tables, clause detection.
**Why it is wrong for this use case**:
1. **Privacy**: Document content (NDA text, employment agreements, financial disclosures) would leave the customer's instance and be sent to an external API. DocuSeal is self-hosted and open-source specifically because enterprises need data sovereignty. This is a non-starter for most DocuSeal deployments.
2. **Latency**: LLM API calls take 2-10 seconds. Text View would have a loading state that breaks the UX promise of an accessibility alternative. AT users would experience worse performance than sighted users.
3. **Cost**: Per-document API costs at scale are non-trivial and inconsistent with the open-source, self-hosted model.
4. **Accuracy caveat**: LLMs hallucinate. For a legal document where the exact wording matters, an LLM that paraphrases or restructures while parsing could introduce errors that sighted reviewers would not catch.
**Recommendation**: Reject entirely for this use case.
#### Approach D: Pdfium layout data (`text_nodes`)
**What it delivers**: Character-level bounding boxes, font sizes, and positions. Can reconstruct reading order, infer heading level from font size, identify columns, detect table cell boundaries from coordinate alignment.
**What the codebase already has**: `Pdfium::Page#text_nodes` (already implemented in `lib/pdfium.rb`) returns `TextNode` structs with `x`, `y`, `w`, `h`, and `content`. `FPDFText_GetFontSize` is already attached. This is the data needed.
**Recommended use**: Font size analysis. Collect all unique font sizes per page, find the median body text size, and flag runs of text where font size > 120% of median as headings. This is a 30-40 line addition to the processing pipeline.
**Current limitation**: `text_nodes` is per-page and re-computed on demand (not cached in metadata). The processing pipeline already calls `page.text` (the plain string); calling `page.text_nodes` additionally during `extract_page_texts()` would add processing time at document upload.
**Recommendation**: Use for a Phase 2 enhancement if heuristic parsing proves insufficient. Do not block the MVP on this.
### Minimum viable semantic structure
For the MVP, deliver:
1. A single `<article>` per document (not per page) with `lang` attribute matching document locale
2. Per-page sections as `<section aria-label="Page N">` — one heading-level landmark per page
3. Heuristic parsing converting all-caps lines to `<h2>`, numbered lines to `<ol>`, bullets to `<ul>`, rest to `<p>`
4. Full-document single scroll (NOT paginated) — see question 4 for rationale
This is readable, navigable by AT users, and can be produced entirely from the stored `pages_text` metadata without any additional Pdfium calls.
---
## 3. Signing Form Complication
### The core problem
In the signing form (`submit_form/show.html.erb`), the sticky bottom panel contains the Vue 3 submission form component that drives the signing workflow — field navigation, signature capture, completion. This Vue component reads the DOM for `page-container` elements to drive its scroll-to-field logic. A tab switch that replaces the PDF panel with a text panel would break the Vue component's DOM assumptions.
Additionally, in the signing context, the legally relevant representation is the PDF image. The signer is attesting to the document as visually presented. Replacing that with extracted text in the signing flow creates the same trust problem identified in the previous expert opinion.
### Evaluation of options
#### Option A: Text View is read-only; switch back to PDF View to sign
**UX model**: Two-mode. User can read in Text View, then explicitly return to PDF View to complete fields.
**Accessibility impact**: For AT users (screen reader, keyboard-only), this is workable. The tab switch is a clear, explicit action. However, it creates a redundant navigation burden: AT user reads text in Text View, must switch back to PDF View, must re-navigate to the first incomplete field.
**Implementation complexity**: Low. Text panel has no form fields. Vue component remains in PDF panel and is not affected.
**Verdict**: Acceptable for MVP. The return-to-sign burden is real but not severe for a two-page NDA. For a 40-page complex form, it is a problem.
#### Option B: Inline form fields in Text View
**What it would require**: Mapping each form field's page/coordinates to a position in the heuristic-parsed text, then inserting Vue field components at those positions. This requires coordinate-to-text-position mapping that the current data model does not support — you would need to store character bounding boxes (Approach D data) and match field bounding boxes (from `fields_index` areas) to character positions.
**Verdict**: Reject for MVP. Engineering effort is 2-3 weeks minimum. The data infrastructure does not exist. This is a future Phase 3 investment if the product team commits to it.
#### Option C: Text View with sticky "Continue Signing" CTA
**What it is**: Text View shows the document text. At the bottom (sticky) is a persistent call-to-action: "Return to PDF View to complete signing" that scrolls or switches to the PDF tab.
**Accessibility impact**: This is good. AT users can read the full document without losing context, then explicitly navigate to the completion action. The CTA is focusable, labeled, and persistent.
**Implementation complexity**: Low. The sticky CTA is just a button in the Text panel that calls the tab-switch function and scrolls to the first incomplete field.
**Verdict**: This is the right choice for the signing form MVP. Minimal engineering, clear user model, accessible.
#### Option D: Text View as a drawer/panel, not a full tab replace
**What it is**: Text View does not replace the PDF panel. Instead, a side panel or overlay drawer shows the text alongside the PDF. On mobile, it would be a bottom sheet.
**Problem in signing context**: The signing form already has a fixed bottom panel (the Vue submission form). A text drawer would compete for vertical space on mobile.
**Problem in general**: The `submissions/show.html.erb` already has a three-column layout (document thumbnail sidebar + document view + parties sidebar). Adding a fourth pane is not feasible.
**Verdict**: Reject. The existing layout does not have room for a persistent text panel alongside the PDF on the signing form. On the submission preview, the right panel (parties view) already serves a different purpose.
### Recommendation for signing form
Use **Option A + Option C combined**:
- Text View in the signing form is read-only
- A sticky non-scrolling CTA banner at the bottom of the Text panel (inside the tabpanel, positioned sticky within it) says "Ready to sign? Switch back to PDF View" with a button that activates the PDF tab
- The Vue submission form panel (at the page level, outside the tabpanel) is unaffected and remains visible at all times (in both tab states) so the user always knows signing is available
This means the signing form's tab behavior is:
- **PDF View tab active**: Normal signing experience, Vue form panel at bottom
- **Text View tab active**: Text content, Vue form panel still at bottom (always visible), text panel has a banner pointing back to PDF View
The Vue form panel being always-visible in both tab states means switching to Text View does not break the signing workflow — it just replaces the document image area with the text area, while the signing controls remain accessible.
---
## 4. Scoped Recommendation
### Right scope: both pages, but different behavior
**Submission preview (`submissions/show.html.erb`)**: Full "PDF View / Text View" tab switcher. Text View is read-only, full-document single scroll. This is the simplest case — no signing complications. Prioritize this page first.
**Signing form (`submit_form/show.html.erb`)**: Tab switcher with read-only Text View and sticky "return to sign" CTA. The Vue submission form panel remains visible in both views. Implement second, after preview is stable.
**Template builder**: Do not add a Text View. The builder is for document authors who need to see the visual layout for field placement. Text View is not relevant to their task.
### Minimum implementation that delivers real value
The users who benefit most are:
1. **Low-vision users not using screen readers** — they can zoom text, use browser translate, use dyslexia fonts via browser extensions. All require visible DOM text.
2. **Cognitive disability users** — simplified, reflowed text without visual PDF complexity reduces cognitive load when reading a contract before signing.
3. **Language barrier users** — browser auto-translate works on visible DOM text. A French speaker receiving an English NDA can press translate and read a machine-translated version before signing.
4. **Mobile users on slow connections** — text loads instantly, images may not. Text View as a fallback for poor network conditions is a real practical benefit.
None of these users are served by the current `sr-only` implementation. The tab switcher is the correct minimal feature to serve them.
### Per-page vs. full-document single scroll
**Single scroll is strongly preferred.** Here is why:
1. **AT navigation**: Screen reader users navigate long text by headings (H key in JAWS/NVDA). A full-document single article with heading structure lets them jump to "Section 3" of a contract without pagination friction. Per-page tabs or pagination destroys this.
2. **Browser Find in Page**: Works across the entire visible document. If content is paged, Cmd+F only searches the visible page. A user searching for "indemnification" would not find it on page 4 if they are viewing page 1.
3. **Browser translate**: Chrome's Page Translate works on the visible DOM. A paged text view may not translate the full document in one pass.
4. **Copy/paste**: Users who want to copy a clause from a multi-page document do not want to page through it; they want to Cmd+A or select-and-copy across a continuous document.
5. **Cognitive load**: Pagination introduces navigation overhead. Users with cognitive disabilities benefit from fewer controls, not more.
**Single scroll with per-page section headings** gives the best of both worlds: the document reads as a continuous flow (good for linear reading), but AT users can jump to "Page 3" section marker (good for navigation) and sighted users can scroll normally.
### Pitfalls the team might miss
#### 1. The tab switcher must remember state across Turbo navigation
DocuSeal uses Turbo Drive. If the user switches to Text View and then Turbo navigates away and back, the tab state resets to PDF View. This is probably acceptable behavior (default to PDF on fresh page load), but it should be deliberate. Store the preference in `localStorage` and restore it on `turbo:load`. Do not use a cookie (requires server round-trip, affects legal audit trail unnecessarily).
#### 2. The hidden panel must use `hidden` attribute or `display: none`, not `visibility: hidden`
`visibility: hidden` keeps the element in the accessibility tree. AT users would navigate into a "hidden" panel. Use `hidden` HTML attribute (maps to `display: none`) or `aria-hidden="true"` on inactive panels. The ARIA tab pattern requires that inactive `tabpanel` elements use `hidden` attribute (not just visual hiding with CSS classes).
DaisyUI's tab component may not do this correctly out of the box — verify before shipping.
#### 3. Focus management on tab switch
When the user clicks/keyboards to a new tab, focus should remain on the newly activated tab button — NOT move into the panel content. The panel becomes accessible by pressing Tab from the active tab. Do not auto-focus the panel on activation.
Exception: if the panel was activated via keyboard and the panel has no focusable children, Tab after activating the tab should move into the panel. This is standard APG behavior.
#### 4. Text quality disclosure
Somewhere in the Text View — either a brief banner or a tooltip on the tab button — inform users that "Text View provides an accessible alternative. The PDF View is the authoritative document." This is not a legal disclaimer (that is overkill) but a brief user-facing note that sets correct expectations. Example: `<p class="text-sm text-base-content/60 mb-4">This text representation is provided for accessibility. The PDF view is the signed document.</p>` at the top of the text panel.
#### 5. The 15-page extraction cap
`MAX_NUMBER_OF_PAGES_PROCESSED = 15` means documents with more than 15 pages will have no text for pages 16+. The Text View must handle this gracefully. Options:
- Show text for pages 1-15, then an "i" info message: "Text not available for pages 16 and beyond"
- Do not show the Text View tab at all if the document exceeds 15 pages (simpler, avoids partial text confusion)
The second option is more conservative and avoids the misleading "there is text but only for some pages" situation. Recommendation: hide the tab if `pages_text` keys count < `number_of_pages`. This is a simple Ruby check.
#### 6. RTL document handling
The `dir="auto"` attribute on paragraph elements is essential for documents that mix Hebrew, Arabic, or Persian text (which DocuSeal's multilingual user base may encounter). `<p dir="auto">` lets the browser infer text direction per-paragraph. Without it, RTL text in an LTR container renders as reversed word-soup.
#### 7. Do not use `role="tablist"` + DaisyUI checkbox tabs
DaisyUI's tab pattern uses `<input type="radio">` and CSS `:checked` pseudo-selectors, not the ARIA tab pattern. This is a completely different interaction model and does NOT satisfy APG keyboard behavior. If using DaisyUI, you must either:
a. Override DaisyUI tabs with a JavaScript-driven ARIA tab implementation, or
b. Use the `<div role="tablist">` + `<button role="tab">` pattern independently of DaisyUI
DaisyUI's radio-based tabs are NOT keyboard-navigable in the ARIA-specified way (arrow keys do not work as tabs; spacebar selects the radio). Attempting to bolt ARIA roles onto DaisyUI tab markup without custom JavaScript will produce an incorrect implementation.
---
## 5. Suggested Implementation Sequence
### Step 1: Ruby service for text-to-HTML conversion (no UI yet)
Create `lib/pdf_text_to_html.rb` (or `app/helpers/pdf_text_html_helper.rb`):
```ruby
# Input: Array of page text strings (from pages_text metadata)
# Output: HTML string suitable for rendering in a tabpanel
#
# Rules:
# - Wrap output in <article>
# - Each page → <section aria-label="Page N">
# - ALL_CAPS line (≥3 chars, no sentence punctuation) → <h2>
# - Line matching /^\d+\.\s/ → accumulate into <ol><li>
# - Line matching /^[•\-\*]\s/ → accumulate into <ul><li>
# - Non-blank remaining lines → <p dir="auto">
# - Blank lines → close current block, open next paragraph
```
Test this service with real DocuSeal PDF fixtures. Adjust heuristics based on actual output quality.
### Step 2: Text View HTML on submission preview (read-only, no signing)
Add the tab switcher to `submissions/show.html.erb`:
- Tab list above the `#document_view` div
- PDF panel = existing `#document_view` content
- Text panel = rendered HTML from the service
Implement JavaScript (Stimulus controller or vanilla, depending on codebase patterns) for:
- ARIA attribute updates on tab switch (aria-selected, hidden on panels)
- Arrow key navigation
- localStorage persistence
Verify with VoiceOver and keyboard-only navigation before merging.
### Step 3: Adapt for signing form
Add the same tab switcher to `submit_form/show.html.erb`.
- Vue submission form panel remains outside the tab structure (always visible)
- Add sticky "Ready to sign?" banner inside Text panel
Verify that Vue component scroll-to-field behavior is unaffected when Text panel is active.
### Step 4: Consider font-size heuristics (Phase 2)
After Step 3 ships and user feedback is collected: extend `extract_page_texts()` to also store font-size segments using `text_nodes`. Use this data in the HTML service to emit `<h1>` vs `<h2>` based on actual font size rather than text-pattern heuristics alone.
---
## 6. Decision Summary
| Question | Answer |
|----------|--------|
| Tab pattern — WCAG compliant? | Yes. `role="tablist"`, `role="tab"` (with roving tabindex), `role="tabpanel"` (with `hidden`). Arrow keys navigate tabs, Tab moves into panel. |
| Text View content | Heuristic-parsed HTML: per-page `<section>`, heuristic headings as `<h2>`, numbered lists as `<ol>`, bullets as `<ul>`, rest as `<p dir="auto">`. Single-scroll full document. |
| Text generation approach | Approach B (heuristic) for MVP. Approach D (font-size from `text_nodes`) for Phase 2 enhancement. Reject A (`<pre>`), C (LLM), and standalone D for MVP. |
| Signing form | Option A + C: Text View read-only, Vue form panel always visible, sticky "return to sign" CTA in text panel. |
| Scope | Both pages. Preview first, signing form second. Template builder: skip. |
| Per-page or full document | Full-document single scroll with per-page `<section>` markers. |
| Key pitfalls | DaisyUI tabs incompatible with ARIA APG; 15-page cap needs graceful handling; `hidden` attribute (not CSS) on inactive panels; text quality disclosure; localStorage tab state persistence; `dir="auto"` for RTL. |

@ -0,0 +1,222 @@
# Expert Opinion: Should PDF Text Be Made Visible to All Users?
**Date**: 2026-02-25
**Branch**: extract-content-from-pdf
**Question from product team**: Should the extracted PDF text (currently in `sr-only` divs) be made available to non-AT (non-assistive technology) users?
---
## Current Implementation
The implementation extracts raw text from each PDF page via Pdfium's `FPDFText_GetText` API during the document processing pipeline (`lib/templates/process_document.rb` → `extract_page_texts()`). The text is stored in blob metadata at `attachment.metadata['pdf']['pages_text']` as a hash keyed by page index string (e.g., `{ "0" => "...", "1" => "..." }`).
In both `app/views/submit_form/show.html.erb` and `app/views/submissions/show.html.erb`, the text is rendered as:
```erb
<% if (page_text = document.blob.metadata.dig('pdf', 'pages_text', index.to_s)).present? %>
<div class="sr-only" role="region" aria-label="Page N text content"><%= page_text %></div>
<% end %>
```
The `sr-only` class (Tailwind: `position: absolute; width: 1px; height: 1px; overflow: hidden; clip: rect(0,0,0,0)`) hides this content from all sighted users while keeping it in the DOM for screen readers. The text is placed directly after the page `<img>` and before the absolutely-positioned field overlay div.
---
## 1. The Case FOR Making Text Visible to All Users
### Browser "Find in Page" (Cmd/Ctrl+F)
This is the strongest argument for keeping text in the DOM — and it already works with `sr-only`. Content hidden with `sr-only` (CSS clip, not `display:none` or `visibility:hidden`) IS searchable via browser Find. This means the current implementation already provides this benefit without any additional work. Sighted users who press Cmd+F and search for a word that appears in the document will get a match — they just will not be able to see the highlighted result because it's a 1×1px invisible element. This is a meaningful half-win, but it's not the full experience.
If text were rendered visibly, Find in Page would highlight the matched text in context, which is genuinely useful for long contracts where a user is scanning for a specific clause.
### Copy/Paste
Users frequently need to copy text from documents they are signing. The page images are raster renders — nothing is selectable. If users need to copy a reference number, an address, or a clause, they must download the PDF separately. Visible text would eliminate this friction entirely.
### Machine Translation / Browser Translation
Chrome's built-in translation, third-party extensions (e.g., DeepL), and OS-level translation tools all require actual visible text in the DOM. With `sr-only`, the text is technically in the DOM but browser translation tools may or may not act on hidden content — behavior is inconsistent across browsers. For multilingual signers who receive contracts in a language they are less fluent in, being able to trigger instant browser translation is a meaningful accessibility gain beyond the AT population.
### Low-Vision Users Who Are Not Screen Reader Users
WCAG explicitly distinguishes screen reader users from the broader low-vision population. Someone with 20/200 vision using 400% browser zoom, or someone using browser text-size overrides, is not served by `sr-only`. They interact with the visual rendering. They cannot read a raster JPEG zoomed to 400% — text becomes a blocky mess. Exposed text would reflow properly with zoom and respect OS font size preferences.
### Cognitive Accessibility
Users with dyslexia, ADHD, or reading disabilities often benefit from using browser extensions like Helperbird, BeeLine Reader, or OpenDyslexic that restyle text for easier reading. These tools only work on rendered DOM text. A sighted person with dyslexia who uses a rendering aid gets zero benefit from the current implementation.
### Low-Bandwidth / Low-End Devices
JPEG page previews are already lazy-loaded and compressed, but on very slow connections or older devices, the images may not load at all. If text were visible, the document content would degrade gracefully — users could read the contract even if images are slow or fail.
### Print Accessibility
When users print the submission preview or signing view, `sr-only` content does not print (CSS `@media print` typically hides it). Visible text would print, giving sighted users a readable text version alongside or instead of image renders.
---
## 2. The Case Against / Concerns
### Layout Fidelity vs. Extracted Text Mismatch — This Is the Decisive Concern
Pdfium's `FPDFText_GetText` returns Unicode text in the order it appears in the PDF's content stream. For simple, linearly-structured PDFs (one-column contracts, standard letter format), this order is identical to reading order and the output is clean prose.
For complex PDFs — multi-column layouts, tables, forms with floating labels, PDFs that are digitally-created but with non-logical drawing order — the extracted text can be:
- Out of reading order (columns mixed together)
- Missing separators between adjacent text runs (words concatenated with no space)
- Duplicated (headers and footers repeated on every page)
- Garbled (text drawn right-to-left in the content stream for visual effect but extracted left-to-right by Pdfium)
DocuSeal's actual users will be sending a wide variety of PDFs. Contracts from Word exports are usually fine. Scanned-with-OCR PDFs, Adobe InDesign exports, and PDFs generated from complex Excel templates are often problematic. If the visible text shown to a signer says something different from what the page image shows — even just word ordering that differs from what is visually readable — this creates a trust and legal problem. A signer might rely on the extracted text version to read a clause and then dispute having signed a different version of that clause.
This risk is specific to DocuSeal's legal document use case. It is not a fatal concern for Google Drive file viewer (viewing only) but is a real concern in signing workflows where the exact text has legal weight.
### Visual Noise in the Signing Flow
The signing interface is carefully designed around a focused, linear task: read the document, complete fields, sign. Adding a visible text block adjacent to or beneath each page image would require a significant UX design decision about how it coexists with:
- Overlaid form fields (which are absolutely positioned over the page image)
- The sticky signature/form panel at the bottom
- The document title header
A naive implementation — dump the text in a visible div right after the image — would make the page look like a duplicated document with the image on top and a text transcript below. This would confuse nearly all users.
### Duplicate Content Perception
Sighted users who see both the rendered PDF page and a text transcript directly below it will immediately ask "why is there a text version of what I can already see?" If the text ordering is slightly different from the visual layout, the confusion escalates to "is this the same document?" This is not a hypothetical — it is the exact UX problem that early PDF viewer accessibility overlays created.
### Performance
Text extraction already happens at document processing time and is stored in metadata. Rendering it visible adds no server-side cost. The concern is DOM size: a 15-page dense contract may have 20,000+ characters of extracted text. All of it rendered in `sr-only` divs is already in the DOM; making it visible does not change DOM size. Performance is therefore a non-issue here specifically.
### Text Extraction Completeness
`extract_page_texts()` caps at `MAX_NUMBER_OF_PAGES_PROCESSED = 15` pages. Documents beyond page 15 have no extracted text at all. If a visible text view were added, users would see text for pages 1-15 and nothing for pages 16+. This inconsistency would be confusing and would need to be addressed before shipping a visible text feature.
---
## 3. Patterns from Other Document Tools
### Google Docs Viewer (docs.google.com/viewer)
Uses a canvas/SVG rendering approach. The actual text layer is rendered invisibly over the canvas in correctly-positioned spans — exact pixel coordinates from the PDF content stream. Find in Page works because the text spans are in the DOM. The text is not visible as a separate block; it overlays the rendered image at the correct coordinates. This is the gold standard but requires knowing the exact bounding box of every character — which the current DocuSeal implementation does NOT do (it stores only the plain text string, not character positions).
### Adobe Acrobat Web (documentcloud.adobe.com)
Renders PDF as a canvas with a separate invisible text layer using absolute-positioned spans at the correct character positions. This enables accurate selection, copy/paste, and Find. Again, requires coordinate-level text positioning data.
### DocuSign
Does not expose extracted text at all in the signing flow. The page is rendered as an image, and no text layer is added. Accessibility is handled through a separate "accessible view" feature that presents a simplified HTML form with field labels and instructions — it does not attempt to show the document text itself. DocuSign's approach acknowledges that trying to make arbitrary PDFs accessible inline is too risky; it separates the accessible interface from the document rendering entirely.
### GOV.UK
For documents published on GOV.UK, the standard pattern is to always provide an HTML version alongside the PDF. The HTML version is a fully authored, human-reviewed alternative — not an automated extraction. Gov.uk's accessibility guidance explicitly states that automatically extracted text from PDFs is unreliable and should not be treated as an accessible alternative without human review. The standard is: if you need an accessible document, publish HTML.
### PDF.js (Mozilla's open-source PDF renderer, used in Firefox)
Uses the same coordinate-based invisible text layer approach as Google and Adobe, allowing Find and selection on the exact rendered text. The text spans are positioned to overlay each character on the canvas render. This approach requires per-character bounding box data.
---
## 4. WCAG 2.2 Perspective
### Does hiding text from sighted users create compliance issues?
No. WCAG 2.2 does not require that all content be visible to all users. What it requires is that content and functionality available via one modality (e.g., visual) also be available via other modalities (e.g., keyboard, AT). The `sr-only` pattern is explicitly supported by WCAG — it is the recommended technique for providing accessible names and supplementary context to AT users without cluttering the visual interface (WCAG Technique C7: Using CSS to hide a portion of the link text).
The content in the `sr-only` divs is supplementary — it provides a text alternative for image-rendered PDF pages, satisfying WCAG 1.1.1 (Non-text Content) for AT users. Sighted users are served by the visual image render. This is perfectly compliant.
### Does making it visible help WCAG conformance?
Not meaningfully. WCAG is already satisfied by the current `sr-only` implementation for the criteria it addresses. Visibility to sighted users does not improve WCAG scores because the criteria being addressed (1.1.1, 1.3.1) are about AT access, not sighted-user access.
However, there is one WCAG criterion where visible text would help that `sr-only` does not:
**1.4.4 Resize Text (Level AA)**: Content rendered as an image cannot be resized without loss of quality. If sighted users with low vision rely on browser text zoom, the JPEG page renders will degrade. Visible text would reflow and scale properly. However, DocuSeal's page image approach is a standard PDF viewer pattern, and WCAG 1.4.4 has an exception for "images of text" — text in the page preview images is exempt from this criterion if the same visual presentation cannot reasonably be achieved using actual text.
**1.4.5 Images of Text (Level AA)**: The pages-as-images approach technically fails this criterion if the information conveyed by the text in those images could be presented as actual text without significantly changing the presentation. However, this criterion also has an exception for "essential" presentations — and a PDF document signing interface where layout fidelity is legally important qualifies as essential. Enterprise document tools universally use this exception.
### Does anything in WCAG prohibit `sr-only` content with `role="region"`?
One nuance: WCAG Technique ARIA20 (Using the region role to identify a region of the page) requires that landmark regions be named and that they not be overused, because too many landmark regions make AT navigation noisy. A 15-page document with `role="region"` on each page's text block creates 15 landmarks in the AT landmark navigation menu. This is a real usability concern for screen reader users, not a WCAG failure per se, but it is worth reconsidering whether `role="region"` is appropriate here.
The correct role for a supplementary text alternative that is not a primary navigation landmark is probably no explicit ARIA role at all, or `role="doc-pagebreak"` before each page with a well-labeled container. Alternatively, wrapping all pages in a single `role="document"` or `role="article"` would create one landmark instead of 15. This is a minor concern but worth addressing.
---
## 5. Recommended UX Patterns if Text Were Exposed to All Users
If the product team decides to expose text to sighted users, here are the options ranked by quality:
### Option A: Coordinate-Positioned Invisible Text Layer (Like PDF.js / Google Docs)
Requires per-character bounding box data from Pdfium — which IS available via `FPDFText_GetCharBox` and `FPDFText_GetRect` (both are already attached in `lib/pdfium.rb`). Each character or word would be rendered in an absolutely-positioned `<span>` over the page image at the correct coordinates. Users could select text, copy it, and trigger browser Find with highlighted results at the correct positions.
**Verdict**: Best possible implementation. Provides all the sighted-user benefits without any visual noise. Requires significant additional engineering (per-character coordinates, word grouping, scaling to displayed size, handling RTL text). This is the right long-term target.
### Option B: Optional "Text View" Toggle
A toggle button in the document header switches between Image View (current) and Text View (plain accessible HTML). In Text View, page images are replaced with the extracted text in a readable, styled `<article>` element. The toggle state persists per-session via localStorage.
**Verdict**: Good for specific use cases (mobile users on slow connections, low-vision users who prefer reflowing text, users who want to copy text). The risk of text extraction quality issues is mitigated because the label clearly says "text view" — users understand they are seeing a different representation. This is achievable with moderate engineering effort.
### Option C: Collapsible "Page Text" Accordion Below Each Page
A `<details>/<summary>` element below each page image, collapsed by default, labeled "Show text content for page N". Sighted users who want the text can expand it; others ignore it.
**Verdict**: Functional but poor UX. Most users will never discover it. The pattern is used in low-effort accessibility retrofits and signals "we added text as an afterthought." It also suffers from the layout mismatch trust problem: once visible, users may compare the accordion text to the image and notice discrepancies.
### Option D: Full-Document Side Panel or Drawer
A "Document Text" panel accessible via a toggle button, showing the full extracted text as a continuous readable document. Similar to how some PDF readers have a "reading mode" or "outline" panel.
**Verdict**: Good for power users, low discoverability for casual users. Avoids the per-page layout mismatch problem by presenting the text holistically. The side panel approach fits the existing UI pattern (the `parties_view` panel on the right in `submissions/show.html.erb` shows a similar right-panel design). Engineering effort is moderate.
### Option E: Keep sr-only, It's Sufficient
The current implementation serves AT users, satisfies all applicable WCAG criteria, and avoids the layout-fidelity trust problem inherent in exposing extracted text to sighted users. For a legal document signing platform, this is a defensible and reasonable product position.
---
## 6. Recommendation
**Keep the `sr-only` implementation as the primary path. Do not expose extracted text to sighted users in the current signing and preview flow.**
Here is the reasoning:
**The legal context is the deciding factor.** DocuSeal is not a PDF viewer or a reading application. It is a document signing platform. Users sign documents — they attest to the content of a specific visual representation. Pdfium's `FPDFText_GetText` produces reading-order text that may differ from visual order in complex PDFs. If a signer reads a clause from the extracted text version and signs, then later claims the text they relied on was different from the image they were presented, that creates ambiguity in the evidentiary record. The JPEG page images are the canonical document representation. Extracted text is a best-effort approximation.
**The sr-only approach already delivers most of the incidental benefits.** Browser Find in Page works on `sr-only` content (it is only hidden via CSS clip, not `display:none`). The text is in the DOM. It contributes to browser translation in most engines. It is fully searchable by crawlers. The only concrete benefit blocked by `sr-only` is user-initiated text selection/copy, which requires visible or `user-select: text` styled text.
**If copy/paste is a priority, implement it narrowly.** If the product team hears repeated feedback that users want to copy text from documents, the right response is to add a per-page "Copy page text" button that copies the extracted text to clipboard without rendering it visibly on the page. This gives users the utility without the layout-fidelity trust problem. The button can be visually subtle (small icon button with accessible label) and appear on hover over the page image.
**The right long-term investment is the coordinate text layer (Option A).** If the team wants to invest in a proper text-layer feature, the Pdfium bindings already include `FPDFText_GetCharBox` and `FPDFText_GetRect`. Building coordinate-positioned text spans would provide Find-in-Page highlighting at correct positions, text selection, copy/paste, and low-vision zoom support — all without the layout mismatch trust problem, because the text would visually align with the image underneath. This is non-trivial engineering (a few days of work) but is the architecturally correct solution.
**Fix the `role="region"` overuse.** Regardless of the visibility decision, 15 `role="region"` landmarks per document is a real AT navigation problem. Consider removing the role entirely and relying only on the `aria-label` on a plain `<div>`, or wrapping all page text divs in a single `role="document"` or `role="article"` region per document.
### Decision Matrix
| Concern | sr-only (current) | Visible text block | Text view toggle | Coordinate layer |
|---|---|---|---|---|
| WCAG compliance | Full | Full | Full | Full |
| Layout mismatch risk | None (hidden) | High | Medium (clearly labeled) | None (aligned) |
| Copy/paste for sighted | No | Yes | Yes | Yes |
| Find in Page highlighting | No (match not visible) | Yes | Yes | Yes |
| Low-vision reflow | No | Yes | Yes | Yes |
| Browser translation | Partial | Yes | Yes | Yes |
| Legal/trust risk | None | Real | Low | None |
| Engineering effort | Done | Low | Medium | High |
| UX confusion | None | High | Low | None |
**Recommendation summary**: Ship Option E now (the `sr-only` implementation is correct). Add a narrow "Copy page text" icon button for copy/paste utility. Plan Option A (coordinate text layer) as a future accessibility investment.
---
## Minor Issues to Fix in the Current Implementation
1. **Remove `role="region"` from page text divs** or replace all per-page regions with a single per-document region. 15 unnamed sub-landmarks per document creates AT navigation clutter. Per-page text divs should be plain `<div>` with `aria-label` only, or use `role="note"`.
2. **15-page cap consistency**: If a document exceeds 15 pages, pages 16+ have no text alternative at all. Consider whether the `sr-only` div should be omitted (as currently) or whether a fallback message ("Text content not available for this page") is more honest for AT users.
3. **Scanned PDF handling**: The current implementation gracefully omits `pages_text` for scanned PDFs. This is correct — do not emit a misleading sr-only div for pages with no extractable text. The current code handles this properly.
Loading…
Cancel
Save