• 沒有找到結果。

Mail Message Format and Encoding

Chapter 2 Background Knowledge

2.1 Internet Email Standards and Fundamentals

2.1.2 Mail Message Format and Encoding

As we mentioned in previous section, the SMTP protocol defined in RFC 821[1] does not define the format of the message body and the message header. However, every aspect of email revolves around the keystone of email, the message. Thus, understanding email requires a solid understanding of how messages are structured. All Internet electronic mail are built upon simple text message format described in RFC 822[2].

RFC 822 defines the basic format that a mail message should be structured in, including how the message headers and body should be presented. RFC 822 has been extended by several later RFCs, but the basic structure remains the same.

The message below shows how a typical Internet email message that has been transmitted between systems may look like:

Received: (qmail 32161 invoked from network); 28 Dec 2001 01:22:23 -0000

Received: from mail.somedomain.com (10.45.124.32) by mail.yourdomain.com with SMTP; 28 Dec 2001 01:22:23 -0000

Received: (from user@localhost) by mail.somedomain.com (8.11.6/8.11.6) id fBS1Mon10019; Thu, 27 Dec 2001 19:22:50 -0600

Date: Thu, 27 Dec 2001 19:22:50 -0600 From: [email protected]

Message-Id: <[email protected]>

To: [email protected] Subject: Hello

Hi there, how's it going?

An Internet email message contains two parts: one is the message header; the other is the message body. A message begins with several headers, which are formatted lines. A header begins with a header identifier, followed by a colon and a space, and then followed by the contents of the header. Most standard header identifiers are specified in RFC 822 and the other RFCs [4][5][6][7] after the extensions. Any other headers used for non-standard purposes may be created of the form “X-field-name”. After the headers comes an empty line,

followed by the message body. In this case, the message body is the text, “Hi there, how’s it going?”; any other text here are all a part of the message header.

Message Headers

A mail message is very simple: it consists of a series of test lines. Each test line is terminated with a CR (carriage-return) character followed by an LF (line-feed) character. This forms a carriage-return-linefeed combination, CRLF.

The lines of a header are grouped into fields. They provide information about the piece of mail that is to be used by both users and programs.

Each header field includes a field name, an optional whitespace, a colon, an optional command folding-whitespace, and an optional field body. It can also contain leading-whitespace. Usually, there is no whitespace between the field name and the colon.

field-name<SP>”:”[SP][field-body]CRLF

The field-name consists of a sequence of any printable US-ASCII characters, but not including the space character or colon. Most field names are a series of alphanumeric characters, often combined with the “-” character.

RFC 822 details the standard headers, which will be used when sending mail across the Internet. Most of these fields are quite common, and are found in most email systems. RFC 822 defines a standard set of fields for mail messages, as shown in Table 4.

The lines in a mail message are not allowed to be longer than 1,000 characters; moreover, long lines can be difficult for humans to read. Due to this, it is possible to split a field into multiple lines for readability. This is called folding. It is allowed in both structured and unstructured fields.

Human readability is also an important concern in creating the mail message standards.

Folding is a good example. It was build into standard to keep the length of lines in the range of 72 to 76 characters. This makes email messages easier to read.

Field name Description

From The creator of the message Sender The sender of the message Reply-To The address to send replies to To Primary recipients of the message Cc Secondary recipients of the message

Bcc Blind Carbon Copy recipients of the message Message-ID The message’s unique identifier

In-Reply-To The message being replied to References All messages ancestors

Date The date the message was created

Received MTA footprint

Return-Path The address of the originator Subject The subject of the message

Comments Miscellaneous comments regarding the message Keywords Topical keywords related to the message Encrypted Encryption information (obsolete) Resent-* Fields created when redistributing

X-* Extension fields

Table 4 Standard Header Fields

A CRLF followed by some amount of whitespace can break long lines into shorter lines, improving the readability. Note that whitespace is necessary because followed by only a CRLF for a header line will not work, since the mail system interprets that as the beginning of a new header. An example of a folded header looks like as follows:

Content-Type: multipart/related; type="multipart/alternative";

boundary="----=_NextPart_XQbgIHkK8cEhPs1ZmYjUkfroG"

Whitespace can consist of either ASCII space or TAB characters.

Message Body

The message body comes after the message headers. An empty line separates the two components. Everything before the empty line is considered to be a header, and everything after it is considered to be the message body.

The "format" of the message body is not necessarily specified in RFC 822; however, it applies some standards. Although all types of content can be sent over SMTP, messages that do not conform to these standards will not make it across the Internet's SMTP network. For example, messages that contain 32-bit binary objects will not be able to cross a multiple-hop mail route, since some systems only support 7-bit ASCII characters.

To convert extended data into 7-bit ASCII, there are two popular encoding techniques in use today: Uuencode and MIME. Uuencode is based on free technology; it has been around for a long time. However, Uuencode has several limitations which prevent its use in many situations. As a solution, MIME was developed as a mechanism for providing rich content over an email transport.

Uuencode and Uudecode

Uuencode and Uudecode provide a method to encode and decode extended data into/from 7-bit ASCII form. However, this method is not yet defined as an RFC.

Uuencode and Uudecode only support a single flat namespace on the filesystem with poor flexibility. They do not work well with Macintosh, AS/ 400, HP-3000, or other systems that use a multi-node file system.

Even though Uuencoded file attachments can be infinitely attached in messages, the messages may not be always cleanly integrated into the message body by the receiving mail system. Also, because international language support for Asian and European keyboards needs 8-bit ASCII characters in the message body, an attachment-centric approach isn’t an apt solution. MIME is better suited for this particular problem.

Multipurpose Internet Mail Extensions (MIME)

Unlike Uuencode and Uudecode, the Multipurpose Internet Mail Extensions (MIME) is actually an Internet Standard for the format of email. The Internet mail system can use MIME to attach files to mail messages. MIME is a protocol for Internet email that enables the transmission of binary data (non-text data) such as graphics, audio, video and other binary types of files. The main MIME RFCs are draft standards (See Table 5). The combination of the SMTP protocol and the MIME specification is the basis of the modern Internet mail system.

Besides converting data and attaching binary files, Uuencode and MIME can also be used to relay extended characters such as those found in Asian and European dialects. For example, a sample message may be shown in English, and it does not use any extended ASCII characters found in non-English languages. Yet, many nations use extended characters that are not found in the 7-bit ASCII character set, and these messages require encapsulation in order to be sent across the Internet reliably. Thanks to Uuencode and MIME to make this possible.

RFC Number Title

RFC 2045 MIME Part One: Format of Internet Message Bodies RFC 2046 MIME Part Two: Media Types

RFC 2047 MIME Part Three: Message Header Extensions for Non-ASCII Text RFC 2048 MIME Part Four: Registration Procedures

RFC 2049 MIME Part Five: Conformance Criteria and Examples Table 5 The core MIME RFCs

相關文件