Module 1: Text preprocessing and linguistic signals#

AINS6004 — Natural Language Processing

Essential Question#

What is lost and gained when language becomes data?

Scenario#

a product team evaluating an NLP workflow before using it in customer-facing communication

Stakeholders: product manager, support lead, privacy reviewer, and model evaluator

Core Moves#

  • Define the decision boundary

  • Compare baseline and alternative

  • Interpret evidence and assumptions

  • Identify failure modes

  • Recommend next action

Lab & Assignment#

Compare tokenization choices on a small corpus.

Artifact: NLP evaluation packet with task framing, retrieval/evaluation design, and deployment guardrails focused on text preprocessing and linguistic signals: Compare tokenization choices on a small corpus.