Architecture
The v2 import tree at a glance
Import tree
hwpapi
βββ App # core.app.App β slim facade (β€15 public members)
βββ Document # document.Document β per-document surface (app.doc)
βββ collections/ # dict-like aggregations under app.doc.*
β βββ Collection # Protocol: __getitem__, __iter__, __len__,
β β # __contains__, names, filter
β βββ fields # FieldCollection + Field
β βββ bookmarks # BookmarkCollection + Bookmark
β βββ hyperlinks
β βββ images
β βββ paragraphs # ParagraphCollection + Paragraph + Run
β βββ styles
β βββ tables # TableCollection + Table + Cell
βββ context/
β βββ scopes # charshape_scope, parashape_scope, styled_text
βββ io/
β βββ open # open_file, new_document
β βββ export # export_pdf, export_image, export_text
βββ errors # HwpApiError hierarchy + wrap_com_error
βββ units # mm, cm, inch, pt, parse, to_mm, ...
βββ logging # get_logger, HWPAPI_LOG_LEVEL
βββ low/ # escape hatch
βββ actions # _Actions, _Action (900+ action wrappers)
βββ engine # Engine, Engines, Apps
βββ parametersets # ParameterSet classes (CharShape, ParaShape, ...)
The two-layer model
hwpapi v2 is a layered API, not a levelled one. Both layers are first-class:
- High layer (
hwpapi.Appβapp.doc.*) β the default surface. Opinionated, collection-oriented, one way to do each thing. - Low layer (
hwpapi.low.*) β the raw action + ParameterSet + engine primitives. Officially supported. The high layer is implemented on top of the low layer, not alongside it.
See ADR-001 for the full rationale.
The Collection protocol
Every dict-like aggregation under app.doc.* implements hwpapi.collections.Collection:
from typing import Callable, Iterator, Protocol, runtime_checkable
@runtime_checkable
class Collection(Protocol):
def __getitem__(self, key): ...
def __iter__(self) -> Iterator: ...
def __len__(self) -> int: ...
def __contains__(self, key) -> bool: ...
def names(self) -> list[str]: ...
def filter(self, predicate: Callable[..., bool]) -> list: ...This is a structural contract, not a nominal one β the collection classes do not inherit from Collection. Runtime checks use isinstance(coll, Collection) thanks to @runtime_checkable.
Collections that support mutation add __setitem__, __delitem__, and clear(). That set is not part of the Protocol, because some collections are read-only (e.g. paragraphs β you canβt delete a paragraph by index without affecting surrounding ones).
The element value-object pattern
Subscripting a collection returns a lightweight element β a reference object that carries (app, identifier) and looks up the underlying state lazily. Elements:
- are cheap to construct (no COM calls)
- hit COM at property-read time
- never cache (reading twice hits COM twice)
Why not dataclasses? Because HWP state can change under you (user undo, concurrent script) and caching a snapshot would lie. When you want a snapshot, take one explicitly (e.g. pset.clone() on a ParameterSet, or a plain dict for element state).
Ownership and lifecycle
App
βββ Engine (via app.engine)
βββ HwpObject (raw COM; via app.api or app.engine.impl)
βββ Document (via app.doc; cached_property)
βββ Collections (per-collection, via cached_property)
Appowns theEngine.Documentdoes not own a COM handle. It holds a reference to its owningAppand fetchesapp.engine.implat the point of use.- Collections similarly reach into
self._app.engine.implon every call β they are stateless wrappers over HWPβs live state.
This means: once app.close() runs, reading from the cached Document or any cached collection will raise β the indirection through app.engine.impl surfaces the invalidation.
The facade boundary
The v2 facade (App + Document + collections + context + io + errors + units) is intentionally small. The test of βwhat belongs on the facadeβ is:
Does removing this symbol break the one-screen
dir(App)guarantee, or does it trade a clear high-level affordance for a low-level call?
If the answer is βbreak the guaranteeβ or βtrade a clear affordanceβ, it stays. Everything else goes to hwpapi.low or a sibling package.