Lecture overview -- Keyboard shortcut: 'u'  Previous page: XML-in-LAML abstractions -- Keyboard shortcut: 'p'  Next page: Another way to query HTML documents -- Keyboard shortcut: 'n'  Lecture notes - all slides and notes together  slide -- Keyboard shortcut: 't'  Help page about these notes  Alphabetic index  Course home  Page 19 : 26
Document Description and Processing in Scheme
How to query HTML documents

LAML supports a number of simple tree traversal functions that can extract information from HTML documents

(define doc
 (html 
  (head 
   (title "Table Examples"))
  (body 
    (p 'id "5" "First paragraph") 
    (p (em "Second") "paragraph")
    (div
      (p "Third paragraph")))))

A sample HTML document.

  • All text in all paragraphs

    • (find-asts doc "p" ast-text)

  • The first paragraph

    • (find-first-ast doc "p")

  • All constituents that satisfy a given predicate

    • (traverse-and-collect-all-from-ast doc (lambda (ast) ...) id-1)