Lecture overview -- Keyboard shortcut: 'u'  Previous page: How to query HTML documents -- Keyboard shortcut: 'p'  Next page: Being a little more advanced... [Section] -- Keyboard shortcut: 'n'  Lecture notes - all slides and notes together  slide -- Keyboard shortcut: 't'  Help page about these notes  Alphabetic index  Course home  Page 20 : 26
Document Description and Processing in Scheme
Another way to query HTML documents

LAML supports an extraction of information from HTML documents in a way similar to X-path

(define doc
 (html 
  (head 
   (title "Table Examples"))
  (body 
    (p 'id "5" "First paragraph") 
    (p (em "Second") "paragraph")
    (div
      (p "Third paragraph")))))

A sample HTML document.

  • The div instances in the body instance

    • (match-ast doc (location-step 'child "body") (location-step 'child "div"))

  • All p instances

    • (match-ast doc (location-step 'descendant "p"))

  • All p instances that have an id attribute

    • (match-ast doc (location-step 'descendant "p" (nt:for-which (location-step 'attribute "id"))))