Acorn URL Fetcher API Specification

Document Status

Distribution:   General Release
Title:   Acorn URL Fetcher API Specification
Drawing Number:   1215,220/FS
Issue:   0.25
Author(s):   Paul Wain
  Carl Elkins
  Stewart Brodie
  Andrew Hodgkinson
Date:   12/11/1998
Revision:   N/A
Change Number:   ECO 4131
Last Issue:   0.24 (04/08/1998)

Contents

  1. Document Status
  2. Issue / revision history
  3. Overview
  4. Outstanding issues

  5. Client to URL module interface

    1. URL_Register
    2. URL_GetURL
    3. URL_Status
    4. URL_ReadData
    5. URL_SetProxy
    6. URL_Stop
    7. URL_Deregister
    8. URL_ParseURL

      1. 0: URL_ParseURL_ReturnLengths
      2. 1: URL_ParseURL_ReturnData
      3. 2: URL_ParseURL_ComposeFromComponents
      4. 3: URL_ParseURL_QuickResolve

    9. URL_EnumerateSchemes
    10. URL_EnumerateProxies

  6. Protocol module to URL module interface

    1. URL_ProtocolRegister
    2. URL_ProtocolDeregister

  7. URL module to Protocol module interface

    1. Protocol_GetData
    2. Protocol_Status
    3. Protocol_ReadData
    4. Protocol_Stop

  8. URL module service calls

    1. Service_URLProtocolModule

      1. 0: URLModuleStarted
      2. 1: URLModuleDying

    2. Service_URLProtocolModule_ProtocolModule

  9. URL module *-commands

    1. URLProtoShow

  10. URL errors

  11. Performance targets
  12. Glossary
  13. References

    1. RFC 1630
    2. RFC 1738
    3. RFC 1808
    4. RFC 1945
    5. RFC 2068
    6. RFC 959
    7. RFC 977
    8. RFC 1625
    9. 1215,215/FS

Issue / revision history

1215,2201 (Developers only)
0.16 19/10/1997 First formal version of specification based on uncontrolled textual programmer's notes (RCE)
0.16a 20/10/1997 Incorporated notes from ADH and SB (RCE)
0.19 17/11/1997 Incorporated details of service calls (SNB)
0.20 20/11/1997 Incporated details of URL parsing SWI (SNB)
0.21 11/06/1998 All other updates incorporated (SNB)
0.22 22/06/1998 Comments after first review incorporated. Added details of proxy enumeration SWI (SNB)
0.24 04/08/1998 No longer live. ECO 4082. (SNB)
0.25 12/11/1998 Four digit years on all dates. Tidied up white space. Removed smart quotes and n-dashes. Added author details to history. Corrected references on R0 exit words from URL_ParseURL to URL_Status. Added details of bit 1 of flags word in R0 to URL_ParseURL. Clarified a few sentences here and there. ECO 4131. (ADH)

Overview

The URL (Universal Resource Locator module is a general purpose module for fetching data from various Internet services. This specification reflects the behaviour of version 0.42 or later of the URL_Fetcher module. The purpose of the module is to provide a uniform entry point into a set of "fetcher" protocols (e.g. FTP, HTTP, Gopher, NNTP, etc.), without the need for a client application to understand how that protocol works. This is done using a number of generalised URL SWIs. The fetcher protocols modules (hereafter just "protocol modules") with which the URL module communicates, are called only by the URL module itself. The entry points into the protocol modules have similar names to the entry points into the URL module, but these are NOT the same, despite similarities. The system structure is shown in figure 1 below.

      /----------------\                
      |  Applications  |                
      \----------------/                
              |                         
              |                         
              v                         
 /---------------------------\          
 |       URL module          |          
 \---------------------------/          
       ^  |         ^  |                
       |  |         |  |                
       |  v         |  v                
   /----------\ /----------\            
   |   HTTP   | |   FTP    |  . . . . . 
   \----------/ \----------/            

Figure 1: URL Fetching system structure (as a PNG...)

Each client fetch occurs with in the context of a 'session'. Each session is identified by a different session identifier. Client session identifiers are issued by the URL module upon request and remain valid until the client informs the URL module to discard the session. Subsequently, session identifiers may be re-issued by the URL module for new sessions. Only a single object fetch can be performed in any one given session. Sessions cannot be re-used by clients, even if a prior object fetch in that session has completed.

The typical client usage of the system is:

If an application decides it requires a premature termination (eg. the user asked the application to quit whilst an object was being downloaded), then the application calls SWI URL_Stop immediately and then discards the session with SWI URL_Deregister. Typical clients, such as web browsers, will, most likely, have several sessions active concurrently.

The URL module uses its own session identifiers that are passed in many of the SWI interfaces to the protocol modules which are not those known to the client application - the URL module maintains its own private sessions into the protocol modules. Service calls are also provided to ease interaction between the URL module and the fetchers, mainly to inform other modules of the arrival or departure of a particular module.

Each protocol module accepts data and returns results as per the HTTP protocol. Thus any extra client data associated with a request (passed in R4 to SWI URL_GetURL) will take the format of a (possibly empty) set of HTTP headers, an empty line and then the data; and each response will start with an HTTP/1.0 or HTTP/1.1 Response-Line of the format: "HTTP/1.0 200 OK" followed by various headers identifying the content-type of the retrieved data, followed by an empty line, followed by the data itself.

Outstanding issues

There are no outstanding issues.

Client to URL module interface

A typical client would be an application, such as a Web Browser. The following SWI calls provide the interface for an application to control and transfer data via the URL module.

URL_Register
(SWI &83E00)

Initialise a client session with the URL module.

On entry

R0= Flags: All bits currently reserved (must be zero).

On exit

R0=Reserved - currently zero.
R1=Session identifier.

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This SWI initialises a client session with the URL module and provides the client with a session identifier that can be used to monitor the status of the URL module within that client's context. The session identifier is unique for each client session that is registered with URL and is also used as an identifier in subsequent interactions with the URL module.

Multiple registration by the same client application is permitted. This will provide the client with multiple identifiers to the URL module. Calling this SWI does not result in the calling of any protocol module SWIs.

The URL module imposes no limit on the number of concurrently registered sessions, other than having the required memory available in which to store details of the session.

Related SWIs

URL_Deregister

Related vectors

None

URL_GetURL
(SWI &83E01)

Instigate data transfer from / to a resource server.

On entry

R0= Flags:
Bit(s) Meaning
0 If set, R6 is valid.
1  If set, R5 holds length of data in R4 specified buffer, otherwise a single NUL terminated string in buffer.
2-31 Reserved (must be zero).
R1=Session identifier.
R2= Bitfield:
Bit(s) Meaning
0-7  Method (8-bit value, held in bits 0-7).
This is protocol dependent. See the table below for values.
8-15 Method dependent.
16-31 Reserved (must be zero).
R3= URL - the document we are after, including the protocol. For example "http://www.acorn.co.uk/".
R4= Data block - data to send in addition to the URL. Validity is protocol and method dependent.
R5= If R0:1 is set, length of data in R4 data block.
If R0:1 is clear, must be 2.
R6= User Agent - Pointer to string to use as 'User Agent' identifier in request header if R0:0 is set. (NULL pointer or NULL string implies use default identifier - see below).

On exit

R0= Protocol status (see SWI URL_Status, below).

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This SWI is used to instigate a transfer of data to or from (mainly from) a resource server. When this SWI has been called, the URL module checks the per-session and global proxy settings, looking for a match (see SWI URL_SetProxy for details on setting proxies and proxy conflict resolution). If no proxy is to be used, then URL looks for a protocol module which is capable of handling the URL specified by R3. If a proxy setting was found, then a pointer to the proxy URL is placed in R7, R0:31 is forced to value 1, and URL looks for a protocol module which is capable of handling the specified proxy URL. In both cases, if a suitable module cannot be located, the URL module generates an error. If a protocol module capable of handling the URL was found, then all client registers are passed onto the protocol module via the Protocol_GetData SWI call with the exceptions stated above for proxy handling. On exit, R0 will hold the status code returned by the protocol module.

The extra data pointed to by R4 on entry is method and protocol specific. For example, in HTTP, the data comprises HTTP headers and, if appropriate, an entity body. Protocol modules should use this style wherever possible. Note that these headers do not include lines such as an HTTP Request-Line (ie. the "GET / HTTP/1.0" part. For example, when posting data to an HTTP URL as the result of a form submission on a web page, the web browser would supply a Content-Type header, Content-Length header, potentially some kind of encoding header, a blank line and then the entity body.

The User Agent string pointed to by R6 if R0:0 is set, is in indication to the underlying protocol module of how the module should identify itself to remote systems. This controls the User-Agent header for the HTTP protocol module, for example. The protocol module is free to define its default identifier as it pleases, however, following the format of the HTTP User-Agent is recommended where possible and appropriate to the protocol. Modules may choose to ignore or amend any User-Agent string. For example, the AcornHTTP module will suffix the client's User-Agent with its own version number, resulting in complete identifiers such as:

  User-Agent: Acorn Browse/2.06 AcornHTTP/0.82

where the client only specified "Acorn Browse/2.06".

Table of method numbers:

No.FTPHTTP and othersComment
1RETR/LISTGET"Get this object" operation
2n/aHEAD"Get entity headers" operation
3n/aOPTIONS"Get server options" operation
4n/aPOST"HTTP POST" operation
5n/aTRACE"HTTP TRACE" operation
6n/an/aReserved to Acorn - do not use
7n/an/aReserved to Acorn - do not use
8STORPUT"Store this object" operation
9MKDn/a"Create directory" operation
10RMDn/a"Remove directory" operation
11RNFR/RNTOn/a"Rename object" operation
12DELEDELETE"Delete object" operation
13STOUn/a"Store object unique" operation

Applications for new method codes should be made to Developer Support. The range 128-254 is reserved for private non-distributed modules. Method numbers 0 and 255 are reserved and must not be used.

The list of methods specific to FTP quoted above are fully implemented in version 0.28 of the FTP Fetcher module. The list of methods specific to HTTP quoted above are fully implemented in version 0.82 of the AcornHTTP module.

Related SWIs

URL_Register
URL_SetProxy
URL_Stop
URL_Deregister
Protocol_GetData

Related vectors

None

URL_Status
(SWI &83E02)

Obtain information on a session.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Session identifier.

On exit

.
R0= Status word:
Bit(s) Meaning
0 Connected to server.
1 Sent request.
2 Sent data.
3 Initial response received.
4 Transfer in progress.
5 All data received.
6 Transfer aborted.
7-31 Reserved (must be zero).
R1=Preserved.
R2= Server response, as an "HTTP" response code (200, 401 etc.)
R3= Bytes read so far (total body data count).
R4= Total bytes to be transferred in whole transaction if known (approximate value only), or -1 if unknown.

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This SWI is used to monitor the transfer of data from a remote service. It is protocol independent - the exit status bits are common to all services. Clients must test this field bit-wise, since the value is cumulative.

Clients may not assume that the states returned in R0 will progress in any particular combination or order. However, the likely progression during a fetch for a resource being retrieved over a network (when the bits are combined into a single decimal value) is: 0,1,3,7,15,31 and then R0:5 set upon completion, and R0:6 set at any stage when an error has occurred.

Since each protocol module is returning its results according to the HTTP protocol, R2 can be treated as an HTTP response code whatever the URL being fetched. For example, the FileFetcher module will indicate file not found errors by setting the response code to 404 (HTTP's Not Found error code).

Note that in the case of, for example, an HTTP 400 (Forbidden) return, some explanatory data may be received, too. If the amount of data to be received is unknown, R4 will contain -1, however R3 will contain the number of bytes received so far. The R4 value should be treated as approximate, since the exact interpretation varies between protocols.

When this SWI is called, the URL module invokes SWI Protocol_Status for the protocol module concerned with the request.

Related SWIs

URL_Register
URL_Deregister
Protocol_Status

Related vectors

None

URL_ReadData
(SWI &83E03)

Read data pending from a request.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Session identifier.
R2=Client buffer for receiving data.
R3=Size of buffer pointed to by R2.

On exit

R0= Status word (see SWI URL_Status).
R2=Preserved. Contents of buffer modified.
R4=Number of bytes transferred to R2 buffer.
R5= Number of bytes still to be read to complete object (if known) or -1 if unknown.

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This SWI is used to read the data pending from a request, find out how much data has been read on this call and how much more there is remaining to be read for the request. R2 is a pointer to a buffer on entry (and R3 is the size of the buffer), on exit the buffer contains the new data, R4 contains the amount of data written to the buffer and R5 contains the amount of data left to be read. If the amount of data left is unknown R5 will contain -1. R1 always returns the protocol status code. In the event of all the data being read (R5 = 0 on exit), a call to URL_Stop is not required as this is performed automatically when URL_Deregister is called for the client session. Once all data has been read a call to URL_Status can return no meaningful information, simply indicating that the transfer has completed.

The data returned will take the form of a complete HTTP compatible response. Responses should use HTTP/1.0 if possible and avoid HTTP/1.1. For example, AcornHTTP will downgrade any higher version responses to HTTP/1.0, having taken care to remove any features applicable only to the higher version, such as chunked transfer encodings.

When this SWI is called, the URL module invokes the Protocol_ReadData SWI for the protocol module concerned with the request.

Related SWIs

URL_Register
URL_GetURL
URL_SetProxy
URL_Status
URL_Deregister
Protocol_GetData
Protocol_ReadData

Related vectors

None

URL_SetProxy
(SWI &83E04)

Set up a proxy server for a session with the URL module.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Session identifier.
R2= Address of buffer containing a URL base.
R3= URL 'method' to proxy (address of URL fetch identifier to be proxied).
R4= 0 => Proxy request.
1 => Don't proxy request.
All other values reserved.

On exit

R0= Status word (see SWI URL_Status).

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This call is used to set up a proxy server to use for a session with the URL module. If R1 is zero then the proxy is considered global and is used for all sessions. If R1 is a valid session identifier then the proxy server for that session only is set. R2 is a pointer to a string containing the base URL to pass the request on to when a proxy request is made. This is of the form "http://www-cache.demon.co.uk:8080/" (note the trailing '/'). A common error is to omit the port number. If the port number is not specified, then the default port number is used. See discussion under URL_ProtocolRegister regarding how the default port number is derived.

R3 is a pointer to a buffer containing the initial part of the URL to proxy - the URL scheme (eg "http:", "ftp:"). This system has the advantage that requests to certain hosts can be proxied and not others (eg by giving "http://www.acorn.co.uk/" as the scheme). However, if R4 is 1, this indicates that no matter how the proxy settings have been defined, requests to the base URL should not be proxied in this case (R3 is undefined). When a URL_GetURL request is received, the proxy settings are evaluated in the following order:

OrderDescription
1Client no-proxy
2Client proxy
3Global no-proxy
4Global proxy

This is to ensure all client settings override global settings and thus remain safe for the given client - ie. a client which sets up a proxy server and then defaults all other URLs to no-proxy, can, no matter how the global settings are changed, be sure of where requests will end up. If R2=0 on entry, then all proxy settings for the specified session are cleared.

Calling this SWI does not result in any calls being made to protocol modules.

Related SWIs

URL_Register
URL_GetURL
URL_Deregister

Related vectors

None

URL_Stop
(SWI &83E05)

Abort a request placed with the URL module.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Session identifier.

On exit

R0= Status word (see SWI URL_Status).

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This call aborts a current request if there is one associated with the session identifier. In the event of no request being associated with the identifier, an error is generated. The purpose of this SWI call is to provide the client with a way of enforcing the termination of a request. It is not called by the client just because all the data associated with the request has finished being transferred, although it may do that if it so chooses. The URL_Stop call will be made automatically by the URL module when the session is deregistered by the client using SWI URL_Deregister.

When this SWI is called, the URL module invokes the Protocol_Stop SWI for the protocol module concerned with the request.

Related SWIs

URL_Register
URL_Deregister
Protocol_Stop

Related vectors

None

URL_Deregister
(SWI &83E06)

Deregister a client session with the URL module.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Session identifier.

On exit

R0= Status word (see SWI URL_Status).

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This call deregisters the client session from the URL module, freeing up any information the URL module may have kept about the client session (eg proxy information). The session identifier ceases to be valid and becomes available for re-issue on a subsequent call to SWI URL_Register.

When this SWI is called, the URL module invokes the Protocol_Stop SWI for the protocol module concerned, if it has not already done so (e.g. during the processing of URL_Stop).

Related SWIs

URL_Register
URL_Stop
Protocol_Stop

Related vectors

None

URL_ParseURL
(SWI &83E07)

Parse URLs to / from their constituent parts.

On entry

R0= Flags:
Bit(s) Meaning
0  If set, R5 contains number of words in data block, else a default of 10 words is assumed.
1  If set, character codes 0 to 31 and 127 in the URL will be escaped (hex encoded, e.g. space becomes '%20') - only available in URL 0.42 or later. URL 0.38 through to 0.41 inclusive always escape these characters. Versions prior to 0.38 never do this.
2-31  Reserved (must be zero).
R1= Reason code:
0 => Return component buffer requirements.
1 => Return component data in specified buffers.
2 => Construct full URL from component buffers.
3 => 'Quick parse'.
R2=Pointer to base URL.
R3= Pointer to URL relative to base URL (or NULL if none).
R4= Pointer to data block of R5 words (unless R1 = 3, see below, or R0:0 is unset, in which case R4 points to a buffer of at least 10 words in length).
R5= If R0:0 set, size of R4 block in words.

If R3 is non-NULL, it is assumed to point to a partial URL which needs to be resolved with respect to the base URL pointed to by R2. If R3 is NULL, then R2 is assumed to point to a full URL.

On exit

R0= Flags: All bits currently reserved (must be zero).

All other registers preserved. Data block at R4 is updated in line with entry reason code.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This SWI is used to parse URLs into their constituent parts, enabling clients to extract the various fields from the URL in a reliable manner. The call is also capable of resolving a relative URL to produce a fully-qualified URL, and of reconstructing a full URL from a set of components.

The data block referred to above is either a block of integers which will be updated to contain the size of the required buffer for each element, or a block containing pointers to buffers for the actual data.

All strings are zero-terminated and all lengths include space for the zero terminator.

The number of entries in the block is specified in R5 if R0:0 is set on entry. If R0:0 is clear, then the default value of 10 is assumed. The format of the data block is:

OffsetUsage
+0Fully canonicalised URL.
+4URL protocol (e.g. "http", "ftp") forced to lower-case.
+8Hostname (e.g. "www.acorn.com") forced to lower-case.
+12Port (e.g. "80").
+16Username - used for FTP authentication and mailto.
+20Password - for FTP.
+24Account - for FTP.
+28Path (e.g. "pub/riscos/releases") (See note).
+32Query - for HTTP, things after a query character.
+36Fragment - for HTTP, things after a hash character.

It is anticipated that this SWI will be called twice: the first time to find the lengths of the buffers, and the second to retrieve a copy of the data into the buffers. The URLs pointed to by R2 and R3 (if used) need not be fully-qualified, e.g. R2 may point to "www.acorn.com/browser/". The fully canonicalised version of the URL at block+0 refers to a fully-qualified, canonicalised version of it, which in this example would be "http://www.acorn.com/browser/".

During canonicalisation, the port number will be elided if possible. See the discussion under SWI URL_ProtocolRegister for details of how URL discovers whether this is possible or not.

Note: The path will not start with a '/' unless the URL being parsed explicitly specified one - this is in keeping with the URL specification, so for example, given the URL "http://www.acorn.com/browser/", then the path component is "browser/", and not "/browser/"; the slash between the hostname and path is a separator only, not a part of either component.

The entry reason codes are described below.

Related SWIs

URL_ProtocolRegister

Related vectors

None

URL_ParseURL 0
(SWI &83E07)

URL_ParseURL_ReturnLengths: Work out space required for URL components.

Use

When R1 is 0 on entry to the SWI, the data block is treated as a block of unsigned 32-bit integers. The contents of the block are ignored on entry, but on exit are filled in with the lengths of the individual components of the URL. A value of zero is stored for a field which does not exist; non-zero values include space for a zero-byte terminator.

Related SWIs

URL_ParseURL
URL_ParseURL 1
URL_ParseURL 2

URL_ParseURL 1
(SWI &83E07)

URL_ParseURL_ReturnData: Split a URL into its component parts.

Use

When R1 is 1 on entry to the SWI, the data block is treated as a block of pointers to buffers to receive the components of the URL. Each of the pointers in the data block must be either zero, indicating that the caller is not interested in that field, or point to a buffer which is sufficiently long to receive the field. The client can ensure this by having previously used reason code 0 to determine the length required.

Related SWIs

URL_ParseURL
URL_ParseURL 0
URL_ParseURL 2

URL_ParseURL 2
(SWI &83E07)

URL_ParseURL_ComposeFromComponents: Combine the components of a URL.

Use

When R1 is 2 on entry to the SWI, the data block is treated as containing the broken down fields of a URL. Each of the pointers in the data block must be either zero or point to a buffer containing the value of the component, with the exception of the full URL field, which is a pointer to a buffer to receive the fully canonicalised URL. This buffer is filled in on exit.

Related SWIs

URL_ParseURL
URL_ParseURL 0
URL_ParseURL 1

URL_ParseURL 3
(SWI &83E07)

URL_ParseURL_QuickResolve: Quickly obtain a fully resolved URL.

Use

When R1 is 3 on entry to the SWI, R4 points to a buffer for receiving the fully resolved URL. R5 is the length of the buffer. On exit, the buffer is filled in with the fully resolved URL obtained, and R5 is decreased by the length of the URL (including terminating zero byte). Hence R5 will be negative on exit if the buffer wasn't large enough. There is no fixed rule for calculating the minimum buffer length required for the answer. To guarantee that the buffer is large enough, it should be calculated as:

  length(base URL) + length(relative URL) + 4

If R0:1 is set on entry, there is the potential for up to the entire URL to be hex encoded. In this case, you would need to multiply the above by three. URL 0.37 and earler never hex encodes URLs. Note that URL 0.38, 0.39, 0.40 and 0.41 will always do this; the control through R0:1 was introduced in v0.42. Clients not knowing about this bit (therefore leaving R0:1 unset) will find that 0.42 or later do not automatically escape URLs, this being more sensible default behaviour on the whole.

Characters which are already hex encoded in URLs are left alone in all versions of the URL module.

Clients are strongly recommended to use this reason code if they wish to resolve a relative URL or canonicalise a URL and are only interested in the fully resolved and canonicalised form of the URL, since it is significantly faster than using reason code 0 and then reason code 1. To help reduce the chances of wildly over-allocating buffer space, setting of R0:1 is not recommended unless full hex escaping is definitely required.

Related SWIs

URL_ParseURL

URL_EnumerateSchemes
(SWI &83E08)

Enumerate available fetch schemes.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Context (0 for first call).

On exit

R0=Status flags (currently unused).
R1=Context for next call (-1 if finished).
R2= Pointer to read-only URL fetch scheme (if R1 is not -1).
R3= Pointer to read-only help string (if R1 is not -1).
R4= Protocol module SWI base (if R1 is not -1).
R5= Protocol module version (*100, if R1 is not -1).

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This call is used to discover which schemes are currently available to the URL module. It may be used, for example, to determine whether or not a client of the URL module may deal with a given URL (in combination with SWI URL_ParseURL to extract the scheme) and if not, pass it to the Acorn URI handler to see if anything else in the system can deal with it [9].

URL will not cope gracefully if the protocol module list is updated between calls to this SWI (you may get duplicate modules or miss some out).

Related SWIs

None

Related vectors

None

URL_EnumerateProxies
(SWI &83E09)

Enumerate proxies or no-proxy URLs.

On entry

R0= Flags:
Bit(s) Meaning
0 If set, enumerate the no-proxy list.
1-31 Reserved (must be zero).
R1= Session identifier, or zero for global proxies / no-proxies).
R2=Context (0 for first call).

On exit

R0=Status flags (currently unused).
R1=Preserved.
R2= Context for next call (-1 if finished).
R3= If R0:0 clear: Pointer to read-only URL to proxy (if R2 is not -1).
If R0:0 set: Pointer to a read-only URL to not proxy (if R2 is not -1).
R4= If R0:0 clear: Pointer to read-only proxy URL information (if R2 ia not -1). If R0:0 set: Corrupted, contains no useful information.

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This call is used to discover which URLs proxies are set for on a per session or global basis, or which URLs are not to be proxied. The information pointed to by R3 and R4 where applicable is a copy of that which was passed to SWI URL_SetProxy when the setting was made.

If R0:0 is set on entry, then R4 will be corrupted on exit and may not contain a meaningful value.

URL will not cope gracefully if the proxy list is updated between calls to this SWI (you may get duplicate entries or miss some out).

Related SWIs

URL_SetProxy

Related vectors

None

Protocol module to URL module interface

This section defines the calls provided by the URL module to enable a fetcher protocol module to interact with it.

URL_ProtocolRegister
(SWI &83E20)

Register a protocol module with the URL module.

On entry

R0= Flags:
Bit(s) Meaning
0 If set, R5 contains protocol flags word.
1 If set, R6 contains the default port number.
2-31 Reserved (must be zero).
R1=Protocol module's SWI base
R2=URL fetch scheme supported e.g. "http:" etc.
R3=Version number * 100 e.g. 116 => version 1.16
R4= Informational string. Up to 50 characters of descriptive text, e.g. "Acorn HTTP fetcher".
R5= Protocol flags word, if R0:0 set. See below.
R6= Default port number, if R0:1 set. See below.

On exit

R0= Flags: All bits currently reserved (must be zero).

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This call is used by a protocol fetcher module to register its SWI base and the type of URL that it accepts with the URL module. The SWIs that are accessible from this SWI base are defined in the following section. If the module cannot be registered (e.g. another module is already claiming that URL base), then an error will be returned. R3 is an integer version number and R4 is a pointer to a string containing more information which will be displayed by the *URLProtoShow command (or 0 if no descriptive text is provided).

Typically, it will be called during a protocol module's initialisation code or on a callback set from the module's initialisation code. If the protocol module is registered successfully, then URL will issue a service call Service_URLProtocolModule_ProtocolModule to inform any interested modules.

If R0:0 is set, then R5 contains a protocol flags word. This is used to describe to URL how the resolver should treat URLs from this scheme. The current bits defined are:

BitMeaning when set
0Path is not UNIX-like
1No parsing should be performed on this scheme
2Scheme allows "user@" to precede the hostname component
3Hash (ASCII 35) allowed in hostname (e.g. for file: URLs)
4No hostname component (e.g. mailto: URLs)
5Remove leading ".." components in pathname.

Note that the meanings of set bits are such that zero is a reasonable value to pass for unknown schemes. Note that if URL is requested to resolve URLs using schemes unknown to it, it will assume a protocol flags word value of zero. This may lead to inconsistent behaviour depending on whether the protocol module is loaded or not.

If R0:1 is set, then R6 contains the default port number for this scheme. This is used by the URL resolving code to determine if explicitly specified port numbers can be elided from the URL. For example, when constructing the canonicalised form of "http://www.acorn.com:80/", the port bit is dropped as it serves no useful purpose, leaving "http://www.acorn.com/".

The URL module is primed with knowledge of the following protocols:

  1. mailto:
  2. telnet:
  3. finger:
  4. file:
  5. filer_opendir:
  6. filer_run:
  7. local:
  8. gopher:
  9. ftp:
  10. http:
  11. https:
  12. whois:

It is not necessary for modules implementing those protocols to set either flag bit and hence no need for them to set R5 or R6.

Related SWIs

URL_ProtocolDeregister

Related vectors

None

URL_ProtocolDeregister
(SWI &83E21)

Deregister a protocol module from the URL module.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Protocol module's SWI base

On exit

R0= Flags: All bits currently reserved (must be zero).
R1= Number of client sessions that were using this module.

All other registers preserved.

Interrupts

Interrupt state is undefined.

Re-entrancy

SWI is not re-entrant.

Use

This call should be used by the protocol module to tell the URL module that it is no longer available. The URL module will raise the appropriate disconnect messages with its clients, and tell the protocol module the number of clients that were affected.

Typically, it will be called during a protocol module's finalisation code. If the protocol module is deregistered successfully, then URL will issue a service call Service_URLProtocolModule_ProtocolModule to inform any interested modules.

Related SWIs

URL_ProtocolRegister

Related vectors

None

URL module to protocol module interface

The protocol module SWI interface is only called by the URL module. URL module clients should never call the ReadData/Status/GetData/Stop SWIs directly. The protocol modules are required to supply a SWI interface. There are currently 4 SWIs that need to be supported which run from SWI_base to SWI_base+3. New SWIs common to all protocol modules will only be added at the low-end of the SWI range. Protocol modules must generate standard SWI not known error (error number &1E6) if they receive a call which they do not understand, so that the URL module can determine that they do not support the SWI. Note that there is no general requirement to use SWIs from offset 0 into a SWI chunk, although it makes sense to do this. Protocol modules which support multiple protocols should ensure that they do not place their internal "SWI bases" less than 16 SWIs apart to allow space to future expansion. e.g. AcornHTTP registers http: as &83F80 and https: as &83F90.

Protocol specific SWIs should be added at the top-end of the SWI chunk (ie start at SWI_base+63 and work down) - the AcornHTTP module uses that range to provide clients with access to its HTTP cookie management code, for example.

Note: the Session identifiers used by the URL module to talk to the protocol modules are not the same identifiers used by clients to talk to the URL module. They are not interchangeable.

Protocol_GetData
(SWI SWI_base+0)

Start retrieving data.

On entry

R0= Flags:
Bit(s) Meaning
0-30  As specified by client in URL_GetURL.
31 R7 is valid.
R1=Session identifier.
R2= Method (see table earlier in document).
R3=URL (including fetch scheme).
R4= Pointer to block of data in addition to URL.
R5=Protocol dependent.
R6=Protocol dependent.
R7= If R0:31 is set, proxy URL information. See below.

On exit

R0= Protocol status word (see SWI URL_Status for details).

All other registers are protocol dependent.

Interrupts

Interrupt status is protocol module dependent.

Re-entrancy

SWI re-entrancy is protocol module dependent.

Use

This call is used to start retrieving data. The protocol module should raise any events for the client via the session identifier provided in R1. The URL module calls this SWI in response to one of its clients calling SWI URL_GetURL.

The proxy URL information specified in R7 (if R0:31 is set) gives the location of the proxy to be used in the format of a URL. For example, "http://www-cache.demon.co.uk:8080/". This information is supplied by the URL module and not the client. The protocol module must note that on a proxied request, the target URL indicated by R3 may not have the same fetch scheme. For example, it might be an ftp: URL being proxied through an HTTP proxy service.

Related SWIs

URL_GetURL
URL_ProtocolRegister
URL_ProtocolDeregister
Protocol_Stop

Related vectors

None

Protocol_Status
(SWI SWI_base+1)

Monitor data transfer.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Session identifier.

On exit

R0= Protocol status word (see SWI URL_Status for details).
R2= As URL_Status.
R3= As URL_Status.
R4= As URL_Status.

All other registers preserved.

Interrupts

Interrupt status is protocol module dependent.

Re-entrancy

SWI re-entrancy is protocol module dependent.

Use

This SWI is used to monitor the transfer of data from the remote service. It is protocol independent, with the exit status bits of R0 being common to all fetcher services. R2 should contain the remote server's most recent response code where possible; note that even in the case of, for example, an HTTP 400 (Forbidden) response, some explanatory data may be received, and thus R3 may be non-zero. If the client is unknown to the protocol module then an error should be returned. If the client's last request has finished, but the client session has not yet been deregistered, then the protocol module should return the status code as of the time that the request finished (ie bit 6 or 5 will be set along with another combination if relevant).

The URL module calls this SWI in response to one of its clients calling SWI URL_Status.

Related SWIs

URL_Status
URL_ProtocolRegister
URL_ProtocolDeregister

Related vectors

None

Protocol_ReadData
(SWI SWI_base+2)

Read data pending from a request.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Session identifier.
R2=Address of client's data buffer.
R3=Size of client's data buffer.

On exit

R0= Protocol status word (see SWI URL_Status for details).
R2= As URL_ReadData.
R3= As URL_ReadData.
R4= As URL_ReadData.
R5= As URL_ReadData.

All other registers preserved.

Interrupts

Interrupt status is protocol module dependent.

Re-entrancy

SWI re-entrancy is protocol module dependent.

Use

This SWI is used to read the data pending from a request, find out how much data has been read on this call and how much more there is remaining to be read for the request. The register usage and description is the same as for SWI URL_ReadData. The URL module calls this SWI in response to one of its clients calling SWI URL_ReadData.

Related SWIs

URL_ReadData
URL_ProtocolRegister
URL_ProtocolDeregister
Protocol_GetData
Protocol_Stop"

Related vectors

None

Protocol_Stop
(SWI SWI_base+3)

Abort a current request.

On entry

R0= Flags: All bits currently reserved (must be zero).
R1=Session identifier.

On exit

R0= Protocol status word (see SWI URL_Status for details).

All other registers preserved.

Interrupts

Interrupt status is protocol module dependent.

Re-entrancy

SWI re-entrancy is protocol module dependent.

Use

This call aborts a current request if there is one associated with the session identifier. The URL module calls this SWI in response to one of its clients calling SWI URL_Deregister or SWI URL_Stop.

Related SWIs

URL_Stop
URL_Deregister
URL_ProtocolRegister
URL_ProtocolDeregister

Related vectors

None

URL module service calls

The URL fetcher system has been allocated a block of 256 service calls (&83E00-&83EFF). Two are currently defined. The other 254 are reserved by Acorn for future use.

Service_URLProtocolModule
(Service Call &83E00)

Communicate important events to protocol modules.

On entry

R0=Reason code.
R1=&83E00 (Service_URLProtocolModule).

All other registers are reason code dependent.

On exit

All registers must be preserved, unless claiming the service call. In all the currently defined cases, the service call must not be claimed. Protocol modules must ignore reason codes which they do not understand.

Use

R0 contains a reason code which indicates the type of event:

R0Name of event
0URLModuleStarted
1URLModuleDying

Reason codes

The various reason codes are described below.

Service_URLProtocolModule 0
(Service Call &83E00)

URL module has initialised.

On entry

R0=0 (URLModuleStarted).
R1=&83E00 (Service_URLProtocolModule).
R2=Version number of URL module * 100.

Use

Upon receiving this service call, protocol modules should re-register with the new URL module by issuing SWI URL_ProtocolRegister as usual. It must assume that any previous registration is no longer valid.

This service call must not be claimed.

Service_URLProtocolModule 1
(Service Call &83E00)

URL module is dying.

On entry

R0=1 (URLModuleDying).
R1=&83E00 (Service_URLProtocolModule).
R2=Version number of URL module * 100.

Use

Upon receiving this service call, protocol modules should note that the URL module has gone away and not attempt to talk to it any more until a future Service_URLProtocolModule/URLModuleStarted service call arrives.

This service call must not be claimed.

All other reason codes are reserved to Acorn and must not be used.

Service_URLProtocolModule_ProtocolModule
(Service Call &83E01)

A protocol module has registered or deregistered.

On entry

R0=Reason code.
R1=&83E01 (Service_URLProtocolModule_ProtocolModule).
R2=URL fetch scheme (e.g. "http:", "ftp:").
R3=SWI base chunk of protocol module.
R4= Description of module as shown by *URLProtoShow.

On exit

All registers must be preserved, unless claiming the service call. In all the currently defined cases, the service call must not be claimed. Protocol modules must ignore reason codes which they do not understand.

Use

Defined reason codes:

R0Event
0 URLProtocolModuleStarted
A protocol module has just registered.
1 URLProtocolModuleDying
A protocol module has just deregistered.

All other reason codes are reserved.

URL module *-commands

The URL module provides a single *-command.

Syntax

*URLProtoShow

Parameters

None

Use

Display information on currently registered protocol modules.

Help text:

*URLProtoShow shows all the current protocols known and their SWI bases.

Example:

*URLProtoShow

Base URL SwiBase  Version  Comment
=============================================================================
 ---     0x83e00    038    URL © Acorn 1997-8 (Built: 07 May 1998)
gopher:  0x508c0    010    Gopher Fetcher © Acorn 1997-8 (Built: 17 Feb 1998)
ftp:     0x4bd00    028    FTP Fetcher © Acorn 1997-8 (Built: 19 Mar 1998)
file:    0x83f40    038    File Fetcher © Acorn 1997-8 (Built: 04 Jun 1998)
http:    0x83f80    082    Acorn HTTP © Acorn 1997-8 (Built: 07 May 1998)

Related SWIs

URL_EnumerateSchemes

URL errors

The URL module is allocated two ranges of error numbers, each range being 256 long. The first 32 errors are reserved to the URL module and the rest are reserved to Acorn protocol modules.

ModuleError range
URL&80DE00 - &80DE1F
HTTP&80DE20 - &80DE3F
MAILTO&80DE40 - &80DE5F
File&80DE60 - &80DE7F
FTP&80DE80 - &80DE9F
Gopher&80DEA0 - &80DEBF
WhoIs&80DEC0 - &80DEDF
Finger&80DEE0 - &80DEFF
WAIS&81EF00 - &81EF1F
HTTPS&81EF20 - &81EF3F
News&81EF40 - &81EF5F

Error numbers &81EF60-&81EFFF are reserved for Acorn use only. The URL module errors are:

Error no.Meaning
&80DE00Session ID not found. A client passed an unknown session ID in R1 to one of the URL module's SWIs.
&80DE01URL ran out of memory
&80DE02No matching fetcher for the URL could be found
&80DE03SWI not found (URL Module). URL attempted to call a fetcher's SWI and received a SWI not known error.
&80DE04Session already has had an object fetch performed in it. You cannot re-use this session.
&80DE05No fetch in progress for this session ID. You have called URL_ReadData or URL_Status having already terminated the fetch.
&80DE06SWI Method already exists. URL already knows of a module which provides this method for fetching - another cannot register.
&80DE07No fetch in progress for this session ID. You have not called URL_GetURL before URL_Stop,URL_ReadData or URL_Status.
&80DE08Message not found in Messages file.
&80DE09(No longer used)
&80DE0AUnable to parse URL.

Error numbers for protocol modules are not within the scope of this specification.

Performance targets

Final code size of the version described by this document should be about 25K. When fetches are active, more memory will be claimed from the RMA to record details of the session. The amount claimed depends on the URL being fetched plus the small overhead for the session information.

Temporary workspace is claimed from the RMA as required for URL resolution equivalent to three times the total combined length of the base and relative URLs involved.

Workspace is claimed from the RMA to store details of registered proxies.

All session-specific memory, including proxy information, is freed when the session is terminated.

Glossary

FTP
File Transfer Protocol - an application level protocol for the transfer of files between a remote host computer and a local client, as defined by RFC 959 [6].
HTTP
HyperText Transfer Protocol - a protocol designed to transfer resources ("documents") from a remote server machine to a local client, as defined by RFC 1945 (version 1.0 [4]) and RFC 2068 (version 1.1 [5]).
HTTPS
Secure HyperText Transfer Protocol - HTTP protocol over a communication channel encrypted using SSL.
URL
Uniform Resource Locator, as defined by RFC 1738 [2], [3] - a subclass of URIs (Uniform Resource Identifiers, defined in RFC 1630 [1]) which map onto network access protocols. More commonly, the addresses of objects on the World Wide Web.
NNTP
Network News Transfer Protocol, as defined by RFC 977 [7].
Gopher
The Internet Gopher Protocol - a distributed document search and retrieval protocol.
SSL
Secure Sockets Layer. A specification for encryption of communications on networks.
WAIS
Wide Area Information Servers, as defined by RFC 1625 [8].

References

The following references may be of interest:
RFC 1630
Uniform Resource Identifiers
RFC 1738
Uniform Resource Locators
RFC 1808
Relative Uniform Resource Locators
RFC 1945
HyperText Transfer Protocol (HTTP) version 1.0
RFC 2068
HyperText Transfer Protocol (HTTP) version 1.1
RFC 959
File Transfer Protocol (FTP)
RFC 977
Network News Transfer Protocol (NNTP)
RFC 1625
Wide Area Information Servers (WAIS) over Z39.50-1988
1215,215/FS
Acorn URI Handler Functional Specification
Last updated 20 November 1998
© Acorn Computers Limited, 1997
Valid HTML 4.0