` elements become `` HTML table cells.
4. Any `@role` attribute on the XML element becomes the `@role` attribute on the HTML5 element. If there is no `@role` attribute on the XML element, then the element name becomes the HTML5 `@role` attribute.
5. The `@class` attribute on the HTML5 element is composed of the local name of the XML element, followed by an underscore character and the XML `@role` attribute value if it exists, followed by the XML `@class` attribute values, space separated. For instance, the element `` becomes the `@class="level_Chapter indent1"` attribute.
6. Any `@xml:lang` attribute on the XML element becomes the HTML5 `@lang` attribute.
7. The following attributes carry over from the XML element to an identically named HTML5 attribute: `@style`, `@id`, `@href`, `@idref`, `@src`, `@alt`, `@colspan`, and `@rowspan`.
8. All other attributes carry over to the HTML5 by inserting the prefix `"data-"` + namespacePrefix + `"-"` + XML attribute. For example, the `@name` attribute on the XML element becomes the `@data-uslm-name` attribute.
9. The entire resulting HTML5 document fragment is ordinarily placed within the HTML5 `` element.
10. CSS classes are associated with the resulting HTML5 fragment using well-established HTML5 methods.
The transformation from HTML5 back to USLM-based XML is accomplished by reversing the process described above.
# 9 Hierarchical Model
## 9.1 Concept
Most legislative documents have a well-defined hierarchical structure of numbered levels. The abstract model provides a general-purpose hierarchy necessary for legislative documents. This hierarchy does not impose restrictions, but instead allows any `` to be placed within any ``. This flexibility allows varying structures, some of which are anomalous, to be supported without adding layers of complexity.
## 9.2 Levels
The fundamental unit of the hierarchy is the `` element. A discussion below describes how the `@class` attribute and XML schema substitution groups can be used to subclass this fundamental unit to produce the various levels found in United States legislation.
A level is composed of (1) a `` identification designation, (2) an optional ``, and (3) either primarily textual ``, lower hierarchical `` children, or a few other possible elements.
## 9.3 Big Levels vs. Small Levels
The principal level is the `` level. Levels above the `` level are referred to as "big" levels and levels below the `` level are referred to as "small" levels.
| --- | --- |
| --- | --- |
| Big Levels | title, subtitle, chapter, subchapter, part, subpart, division, subdivision |
| Primary Level | section |
| Small Levels | subsection, paragraph, subparagraph, clause, subclause, item, subitem, subsubitem |
The primary difference between big levels and small levels is in how they are referred to in references. Big levels and primary levels are referred to using a prefix to identify the level's type. Small levels are referred to simply by using the number designation assigned to the level in a level hierarchy. Further details are discussed below under Referencing Model.
## 9.4 Sandwich Structures
Sandwich structures are hierarchical levels that start and/or end with text rather than lower levels. This structure is quite common. Typically, some text will introduce the next lower level or will follow the last item of a lower level. The `` (French for "hat") and `` elements are provided for this structure. The `` precedes the lower levels and the `` follows the lower levels. The `` element can also be used for cases where interstitial text is found between elements of the same level.
One specific type of continuation text is a paragraph-like structure beginning with "Provided that" or "Provided". The `` element is used in this case. Multiple provisos may exist.
## 9.5 Table of Contents
A table of contents (TOC) model is provided to model the level hierarchy. A TOC can appear either at the top of a hierarchy or at lower levels, and in the main part of the document or in an appendix. The root of the TOC structure is the `` element and the levels can be arranged into a hierarchy of `` elements. Attributes from the description group are used to define the information in the hierarchy.
The `` structure can be intermixed with the `` structure to define a tabular layout for a table of contents.
# 10 Table Model
Two table-like models can be used with USLM: (1) a column-oriented model and (2) the HTML table model.
## 10.1 Column-Oriented
Use the column-oriented `` model when information related to the legislative structure is to be arranged in a column- or grid-oriented fashion, but not a true table. The advantage of the column-oriented `` model is that it is defined within USLM, so it can contain other USLM elements. The drawback is that it is a non-standard table model, so it is not inherently understood by tools that handle standard HTML tables.
The `` model is intended to be flexible. It provides a `` element for defining individual rows. When there is a row structure, the parts of each row that belong within each column are identified using the `` element. Any direct child of the `` element is considered to be a row unless it is a `` element.
However, when the `` element is a direct child of the `` element, then there is no notion of rows, and the columns form a basic structure of the ``. If `` contains any `` child, then it must only contain ``s as children.
Like HTML tables, the `` model supports the `@colspan` and the `@rowspan` elements.
## 10.2 HTML Tables
Use the HTML `` model when (1) information is arranged in a tabular structure, (2) there is little information within the table that is part of the legislative structure, or (3) the structure is regarded as a normal table with gridlines and table cells.
An embedded HTML table will look something like this:
```
Schedule 1
…
… |
…
…
```
# 11 Identification Model
## 11.1 Concept
Elements are assigned identifiers or names for two primary purposes. The first is to be able to reliably refer to the element throughout its lifetime, regardless of how it might be altered. The second is to be able to address the item based on its current state. To support both of these purposes, the available attributes are `@id`,`@temporalId`, `@name`, and `@identifier`.
## 11.2 Immutable Identifiers
Immutable identifiers are unchanging identifiers that are assigned when the element is created and do not change throughout the lifetime of the element. This makes them reliable handles with which to access the elements in a database or repository system. The `@id` attribute is used for this purpose. It is defined as an XML Schema ID, which requires that all `@id` attribute values be guaranteed to be unique within a document, with no exceptions.
For the purposes of document management in USLM, especially in the amending cycle, `@id` values should be computed as GUID (Globally Unique Identifiers) with an "id" prefix. This means that they should be computed using an algorithm that guarantees that no two identifiers, in any document, anywhere, will ever be the same. This is a broader definition of uniqueness than imposed by the XML Schema ID definition. There are many tools available to generate GUIDs.
Whenever an element is moved, its `@id` attribute value must be preserved. When an element is copied, a new value for the `@id` attribute value must be generated for the new element created. Special care must be taken to ensure that the `@id` value is managed correctly. Proper management of the `@id` attribute value will provide a reliable handle upon which to attach other metadata such as commentary.
It is important that the value of the `@id` attribute not reflect, in any way, some aspect of the element that might change over time. For instance, if there is a number associated with an element and that number is subject to renumbering, then the `@id` attribute value should have no relation to the number that is subject to renumbering.
## 11.3 Temporal Identity
A `@temporalId` is a human-readable identity, scoped at the document level. While the `@id` attribute is defined to be unique in the document and constant throughout the lifetime of the element, and the schema enforces this uniqueness, the `@temporalId` attribute is defined loosely and changes to reflect the current location and numbering of the element over time.
Because the `@temporalId` attribute is assigned a value that reflects the current state of the element, special care must be taken to ensure that the value of the `@temporalId` attribute is recomputed anytime the state of the element changes. It is usually a good practice to recompute the `@temporalId` values whenever the document is committed or saved.
Ideally, the `@temporalId` attribute should be unique in a document, but this is not always possible due to various anomalies. For this reason, the uniqueness of the `@temporalId` attribute is not enforced by the schema, and some ambiguity is possible. How the disambiguation of duplicate names is handled is a subject that must be dealt with in the design of the software systems which will encounter this situation.
A recommended approach for computing the `@temporalId` value is to base the name on the hierarchy to get to the element, almost in a path-like fashion. The `@temporalId` value can be constructed as follows:
[parentId + "\_"] + [bigLevelPrefix] + num
Where:
parentId is the `@temporalId` value of the parent element. If the parent element does not have a `@temporalId` value or does not have a unique `@temporalId`, then the local XML name of the parent element is used, with special care being taken to ensure that all parent elements that are not unique have assigned `@name` values.
bigLevelPrefix = a prefix reflecting the level type, such as "p" for part, "d" for division, or "sd" for subdivision. For sections use "s". For small levels, no prefix is used.
num = the normalized value of the number or designation given to the level.
Exceptions:
The `` root level should be omitted from the computation.
The `` level should be omitted.
Levels of the hierarchy should be omitted whenever the numbering of a level does not require references to the higher levels. For example, section numbers are usually unique throughout the document, so it is not necessary to use the higher big levels to compute a name. So a section can be identified as simply "s1" rather than "p1\_d1\_s1".
For example, part III of subchapter II of chapter 12 of Title 8 would have a `@temporalId` of "ch12\_schII\_ptIII", and subparagraph (A) of paragraph (1) of subsection (a) of section 1201 of Title 8 would have a `@temporalId` of "s1201\_a\_1\_A".
## 11.4 Local Names
Local names are usually related to the parent element or container in which they are found. This is the purpose of the `@name` attribute. The most common use of the `@name` attribute is when naming a `` element. The `@name` attribute is also used to name a level within the local context of its parent level.
There is a problem with naming a level: its name is subject to change through time. This is because levels are subject to renumbering. To support this, the `@name` can be defined in a parameterized way. The parameters will need to be evaluated whenever a document is requested for a specific point-in-time.
The parameters are specified within the `@name` attribute value using a curly braces notation. Two parameters can be specified:
1. Use the {num} parameter to include the current normalized value (i.e., the `@value` attribute of the `` element) in the name of the level.
2. Use the {index} parameter to include the 1-based index position, calculated against other elements of the same type at the same level.
To better ensure the uniqueness of the `@name` attribute values generated in the future, a rational scheme must be designed. This is important because the `@name` attribute is also used in the mapping of links or references to elements. This process is accomplished by a web server add-in called a "resolver". Resolvers are described in the next chapter.
## 11.5 Identifiers
An `@identifier` is used on the root element to specify the URL reference to the document root. The `@identifier` is specified as an absolute path in accordance to the rules of the Reference Model described in this document.
An `@xml:base` attribute is also specified on the root element to specify the location of a preferred resolver capable of resolving a reference. The `@xml:base` concatenated with the `@identifier` forms a complete URL reference.
Typically, the `@identifier` will be established on the root element and all level elements.
# 12 Referencing Model
## 12.1 Concept
References are a machine-readable format for making very precise citations or establishing links between different things in a document. The prevailing method for establishing references is to use HTTP-based hyperlinks, using the familiar technology prevalent on websites.
These references are, like websites, modeled as Universal Resource Locators (URL). A URL is a string representing a hierarchical path down to the item being requested, using forward slashes "/" as hierarchical separators. In normal websites, each level in the URL represents a folder, terminating in a file that is being requested. URLs can be specified in one of three ways: (1) global references starting with "http://{domain}"; (2) absolute paths starting with "/"; or (3) relative references starting with "./". Absolute paths typically use the local domain as the context for the URL, while relative references use the current file as the context.
USLM references use a variation of the absolute path technique. All references thus start with a forward slash "/". However, rather than representing folder and files, the hierarchical path represents a conceptual hierarchy down to the item in question. This path is known as a logical path. The logical path does not represent the folder/file hierarchy as with a physical path. In fact, there may be no physical path for information stored in a database rather than in a file system.
Web servers usually handle the task of interpreting a URL and retrieving the requested file from the file system. With USLM references, however, the mapping is not so straightforward. A web server must interpret the logical path in the URL and retrieve the requested information from a database. This task is accomplished by a web server add-in called a "resolver".
How the resolver is constructed depends on the web server being used and the storage format for the documents. All modern web servers provide some form of facility to allow a resolver to be constructed. This issue is discussed in greater detail below under Reference Resolver.
## 12.2 URL References
The International Federation of Library Associations and Institutions (IFLA) (http://www.ifla.org/) has developed a conceptual entity-relationship model for organizing bibliographic records (like index cards at a library). This model is called the Functional Requirements for Bibliographic Records (FRBR – pronounced "_Ferber_"). FRBR creates the conceptual framework for the USLM references.
References in USLM are composed using the following format:
> [item][work][lang][portion][temporal][manifestation]
Where:
- item – identifies the location of an instance. For non-computer locations, this is expressed as an http domain. An example would be [http://uscode.house.gov](http://uscode.house.gov).
- work – identifies the logical hierarchy down to the document being referenced. This hierarchy starts by identifying the jurisdiction ("/us" for United States) and continues by identifying the document ("/usc/t5" for Title 5). The jurisdiction is included in order to distinguish between the library that serves the document and the jurisdiction where the document originated. With this approach, it is possible for a library to serve a document from a different jurisdiction.
- lang expression ("!" prefix) – identifies the language. If the lang is not specified, then the language is assumed to be the language of referencing document or referencing environment.
- portion ("/" prefix) – extends the work hierarchy to identify an item within the document. For example, "/s1/a/2" for paragraph (2) of subsection (a) of section 1 in the main body. Note that the portion is an easy mapping of the `@temporalId` for that element which is "s1\_a\_2". This gives a hint for how to resolve the portion part of a URL identifier.
- temporal expression ("@" prefix) - the date/time is expressed according to ISO 8601 ("@2013-05-02" for May 2, 2013). If the "@" is specified, but without a date/time, then the reference is to the current time. If no temporal expression is specified, the context may be used to identify the point-in-time, which is usually the date of the document making the reference.
- manifestation ("." prefix) – identifies the format as a simple file extension (".xml" for the XML file, ".htm" for HTML, and ".pdf" for the PDF).
Examples:
- /us/usc/t5/s1/a – the current version of subsection (a) of section 1 of title 5.
- /us/usc/t5/s1/a@2013-05-02 – the version of subsection (a) of section 1 of title 5 that was in effect on May 2, 2013.
- /us/usc/t5/s1/a.htm - the current version of subsection (a) of section 1 of title 5, rendered as HTML.
- http://uscode.house.gov/download/us/usc/t5/main/s1/a.htm the current version of subsection (a) of section 1 of title 5, rendered as HTML and delivered from http://uscode.house.gov/download.
Notes:
- References in documents (using the `@href` and `@src` attributes) should always be stored as absolute paths, which omit the item part. This allows the reference to be independent of the site hosting the document. The role of the resolver is to determine which location can best serve the desired item. This allows a document to be moved from one digital library to another without changing the references within the XML. The item location is implicit in the library containing the reference. An exception is when a specific item is desired, usually when referencing an item from a foreign jurisdiction.
- There are generally two common methods to identify a document's type. One method is by extension as described above. The other method is to use the MIME type. The MIME type is a more robust solution because it allows for a wide variety in file extensions for the same type (g., ".htm" or ".html" for HTML files). However, file extensions are simpler and less cumbersome. This means that an agreed upon registry of file extensions should be maintained by the system.
## 12.3 Reference Attributes
There are four attributes which contain references or portions of references:
1. `@href` – This is a pointer or link to another document. It is generally stored as an absolute path. Prepending the domain to identify a particular instance or library from which the information is to be sourced is left to the local resolver. This allows a document to be relocated in another digital library without changing all the references.
2. `@portion` – Often the textual representation of a reference is scattered in several places in a document. For instance, a set of amendments might be prefaced with an identification of the document affected, such as title 5 of the United States Code, while the individual amendments might specify only a portion of that document, such as subsection (a) of section 1. The `@portion` attribute allows a reference to be extended. This will generally be constructed as follows:
```
…
```
The example above shows an initial reference to title 5, United States Code. The second reference refers to the first, acquiring the first reference's `@href` and then extending it with the `@portion` to produce the reference /us/usc/t5/s1/a. This approach can be recursive.
3. `@src` – in addition to pointing to other documents, it is often desirable to embed other documents within a primary document. The `@src` attribute is used in this case.
4. `@origin` – When a fragment of a document is copied into another document, and it is necessary to record the place from which the fragment was copied, the `@origin` attribute is used. This exists for the `` and `` elements.
## 12.4 Referencing Nomenclature
The following case-insensitive referencing nomenclature is used;
| Short Form | Long Form | Description |
| --- | --- | --- |
| pl[0-9]+ | publicLaw[0-9]+ | Public Law + number – Statute |
| t[0-9\|a-z]+ | title[0-9\|a-z]+ | Title + number |
| st[0-9\|a-z]+ | subtitle[0-9\|a-z]+ | Subtitle + number |
| ch[0-9\|a-z]+ | chapter[0-9\|a-z]+ | Chapter + number |
| sch[0-9\|a-z]+ | subchapter[0-9\|a-z]+ | Subchapter + number |
| p[0-9\|a-z]+ | part[0-9\|a-z]+ | Part + number |
| sp[0-9\|a-z]+ | subpart[0-9\|a-z]+ | Subpart + number |
| d[0-9\|a-z]+ | division[0-9\|a-z]+ | Division + number |
| sd[0-9\|a-z]+ | subdivision[0-9\|a-z]+ | Subdivision + number |
| s[0-9\|a-z]+ | section[0-9\|a-z]+ | Section + number |
| art[0-9\|a-z]+ | article[0-9\|a-z]+ | Article + number |
| r[0-9\|a-z]+ | rule[0-9\|a-z]+ | Rule + number |
| [a-z]+ | [a-z]+ | Subsection letter |
| [0-9]+ | [0-9]+ | Paragraph number |
| [A-Z]+ | [A-Z]+ | Subparagraph Letter (capital letters) |
| [i-x]+ | [i-x]+ | Clause (lower case roman numeral) |
| [I-X]+ | [I-X]+ | Subclause (upper case roman numeral) |
| [aa-zz]+ | [aa-zz]+ | Item (double lower case letter) |
| [AA-ZZ]+ | [AA-ZZ]+ | Subitem (double upper case letter) |
| [aaa-zzz]+ | [aaz-zzz]+ | Subsubitem (triple lower case letter) |
| (suppress) | main | Main body |
| shortTitle | shortTitle | Short title |
| longTitle | longTitle | Long title |
| preamble | preamble | Preamble |
| proviso | proviso | Proviso |
| app[0-9]\* | appendix[0-9]\* | Numbered or unnumbered appendix |
>**Note:** _The prefixes are defined to be case-insensitive. This is done as case-sensitive URLs can be problematic in some environments._
## 12.5 References within Amendment Instructions
Amendments refer to the item that they are amending. The reference may be complex, specifying not only the item affected, but a relative position either within, before, or after the item affected. Three additional attributes are provided with references to allow this sort of specification:
- `@pos` – Specifies a position that is either at the start, before, inside, after, or at the end of the context item.
- `@posText` – Establishes the context for the position relative to text contained within the referenced item.
- `@posCount` – Specifies which occurrence of the `@posText` within the referenced item is being acted upon. By default, the first occurrence is assumed. In addition to specifying which occurrence, the values all, none, first, and last may also be used.
## 12.6 Reference Resolver
The URL-based references that are established create the links between various documents within the system. A software component is added to the web server to interpret the references, find the relevant piece within the database repository, extract it, and perform any necessary assembly and transformation before returning the result to the requester. This web server add-in is called a resolver. How it is built is determined by the web servers being used. In general, the resolver will perform the following sequence of functions:
1. It will receive a reference from a requestor.
2. It will canonicalize the reference, normalizing the text to match one of the forms it understands.
3. If the reference is to the U.S. jurisdiction and the resolver understands the reference, then it will attempt to resolve it by retrieving the XML from the document. This might be either an entire document of a fraction thereof.
4. If the reference is to another jurisdiction, and the resolver is able to resolve the reference, either locally or by deferring to another web server, then the resolver will resolve the reference that way.
5. If the reference is not understood, then the resolve will return a _404 – file not found_ error.
6. If the document is being resolved locally, and the XML has been extracted from the database, then it may need to be assembled or otherwise processed to ensure that the correct temporal state has been established. If no temporal information is contained in the URL, then the present state is assumed.
7. Once the correct XML has been created, if a format other than XML has been requested it will need to be transformed and/or converted into the correct format. This may involve transforming the XML into HTML or creating a PDF.
8. Some of the steps above may be circumvented in the interest of performance and efficiency with a good caching strategy.
9. Once the requested item has been retrieved, assembled, and transformed, it is returned to the requestor using HTTP.
There are several strategies that the resolver can use to find the item referenced by the work part of the reference URL:
1. The fastest method, if there is a reliable mapping between the `@name` value and the work part of the reference URL, is to map between the reference path and the `@name`. This approach is best when the XML documents are shredded into parts and stored as separate items, either in the file system or in a relational database.
2. Another strategy is to rewrite the reference URL hierarchy as an XPath query. This approach is best when there is a good mapping between the reference hierarchy and the document hierarchy, and the information is stored in an XML repository that supports XPath. Performance might be an issue for more complex XPath queries.
3. The third strategy is to create an indexing mechanism. This solution might rely on the inherent capabilities of the chosen database or repository, or it might be some sort of predefined mapping. How this strategy should ultimately be designed is beyond the scope of this User Guide.
For a specific document, the preferred resolver is identified using the `@xml:base` attribute on the root element. For instance:
`xml:base="resolver.mydomain.com"`
The `@xml:base` concatenated with the `@identifier` forms a complete URL reference.
A preferred resolver does not currently exist for USLM. Therefore, the `@xml:base` attribute is not provided in current USLM documents. The United States House intends to provide a resolver in the future. If and when that occurs, the `@xml:base` attribute will point to that resolver.
# 13 Metadata Model
## 13.1 Concept
In addition to the text in an XML document, there is also a need to store a significant amount of metadata about a document. There are a few ways in which this metadata might be stored:
1. Within the document in a separate partition.
2. Scattered within the document.
3. In a separate file.
4. In a relational database.
All four of these approaches can be supported. First, there is an optional `` block defined at the start of the document. Within this block, properties and sets of properties can be stored. The model for this metadata is open and extensible to support a wide range of needs, while also keeping the core concepts very simple. The metadata stored here can either be generated in an ongoing fashion, or as the result of an analysis of the text after it has been committed, or as a combination of these.
In addition to the basic `` block, attributes are provided throughout the document for storing metadata about a particular element with that elements. Most of these attributes have prescribed usage, and the model is not as general and flexible as the `` block. However, there are a few attributes set aside for unprescribed uses. These include the `@misc`, `@draftingTip`, and `@codificationTip`.
It is possible to store metadata in a separate file or in a relational database. If a separate file is chosen, no format for this file is prescribed. It can be an XML file, some other text file, or even a binary file. One option for the format is to borrow the `` tag with its `` and `` children from USLM. This is merely an option; it is not prescribed.
If the information is stored in a separate file or is stored in a database, then it may be necessary to maintain a strict association between the XML elements and the records in the file. For this reason, element `@id` values are defined to be immutable, in order to provide a reliable handle for making associations. If the `@id` attribute cannot be managed reliably, then the separate file and database options should be avoided.
## 13.2 Properties
Properties are basic elements that may or may not have string content. The `@name` attribute acts as the primary identification for a ``. The `@value` attribute (and its range siblings) or the `@date` attribute (and its range siblings) are used to place normalized values of dates. Sometimes, a value or date might exist as both text content in the element and, in a normalized form, as an attribute.
Properties are primarily intended for use within the ` |