2015年3月の記事一覧

C言語　翻訳段階

投稿日時 : 2015/03/29

小川清

C言語　翻訳段階

次に示す翻訳フェーズによって、翻訳上の構文規則間の優先順位を規定する。

処理系は、実際には、ここに示す幾つかのフェーズを一つにまとめてもよいが、ここに示す別々のフェーズが存在するとみなした場合と同じ規則に従って動作しなければならない。[S}

(1)物理変換（多バイト、改行、３文字表記）

物理的なソースファイルの多バイト文字を、対応するソース文字集合に、処理系定義の方法で、写像する

行の終わりを示すものに対して改行文字を導入する。

３文字表記を対応する単一の文字の内部表現に置き換える。

(2)接合（逆斜線文字処理）

逆斜線文字(\)の直後に改行文字が現れた場合、それらの２文字を削除する。これによって物理ソース行を接合して論理ソース行を作成する。

* 空でないソースファイルは、改行文字で終了しなければならない。[S}

* さらにこの改行文字の直前に（接合を行う前の時点で）逆斜線文字があってはならない。[S]

(3)前処理字句・空白類文字整理　

ソースファイルを、前処理字句及び空白類文字（注釈を含む）の並びに分割する。

* ソースファイルは、前処理字句の途中又は注釈の途中で終了してはならない。[S}

各注釈を、一つの空白文字に置き換える。改行文字を保持する。

+　改行文字をのぞく空白類文字の並びを保持するか一つの空白文字に置き換えるかは処理系定義とする。

(4)前処理

前処理指令を実行し、マクロ呼出しを展開する。

さらに、_Pragma単項演算子式を実行する。

+ 字句連結の結果として生成される文字の並びが国際文字名の構文規則に一致する場合、その動作は未定義とする。

#include前処理指令に指定された名前をもつヘッダ又はソースファイルに対して、フェーズ（１）からフェーズ（４）までの処理を再帰的に行い、すべての前処理指令を削除する。

(5)

文字定数及び文字列リテラル中のソース文字集合の各要素及び各逆斜線表記を、それぞれに対応する実行文字集合の要素に変換する。

+ 対応する要素がない場合、ナル（ワイド）文字以外の処理系定義の要素に変換する。

(6) 連結（文字列リテラル）

隣接する文字列リテラル字句同士を連結する。

(7) 翻訳

　字句を分離している空白類文字は、もはや意味をもたない。各前処理字句を字句に変換する。

　その結果生成された字句の列を構文的及び意味的に解析し、翻訳単位として翻訳する。

(8) 連係（オブジェクト、関数）

すべての外部オブジェクト参照及び外部関数参照を解決する。

その翻訳単位中に定義されていない関数およびオブジェクトへの外部参照を解決するため、ライブラリの構成要素に連係する。

これらすべての翻訳出力をまとめて、実行環境上で実行に必要な情報を含む一つのプログラムイメージを作る。

前提、背景情報

処理系は、何か一つ又は複数のOS上で動く。

OSは、何か一つ又は複数の多バイト文字のコーディング方法に対応している。

OSは、何か一つ又は複数の改行文字に対応している。

翻訳段階(1)関連情報

３文字表記が対応していないと表示できない文字コード集合、表示装置は現在少ない。過去においては、メインフレームのエミュレータ端末等では存在した。

翻訳段階(2)関連情報

逆斜線文字は、日本語環境では￥を表示する場合がある。１バイトの￥は\と同じコードであるので逆斜線文字として機能する。２バイトの￥は、逆斜線文字と同じコードではないので逆斜線文字として機能しない。

日本語環境でも、プログラミング設計環境で、￥ではなく\と表示することがある。

多バイト文字コードでは、逆斜線文字と同じコードを含む場合がある。シフトJISにおける＊＊＊などである。

多バイト文字を受け入れる処理系では、翻訳段階１で逆斜線文字とは異なるソース文字集合に写像する。

擬似的に多バイト文字を受け入れる処理系あるいは、対象の物理ファイルに存在する多バイト文字に対応しない処理系では、翻訳段階１の多バイト文字の処理を適切に行わないものが存在するかもしれない。

//コメントの最後に空白文字を必ず入れるようにすれば、多バイト文字処理が完全でない場合でも、次の逆斜線文字処理を意図せずに実行することはない。

翻訳段階(3)関連情報

6.4 Lexical elements

Atoken is the minimal lexical element of the language in translation phases 7 and 8.

The categories of tokens are: keywords, identifiers, constants, string literals, and punctuators.

A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6.

The categories of preprocessing tokens are: header names,

identifiers, preprocessing numbers, character constants, string literals, punctuators, and

single non-white-space characters that do not lexically match the other preprocessing

token categories.69) If a ' or a " character matches the last category, the behavior is

undefined. Preprocessing tokens can be separated by white space; this consists of

comments (described later), or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both.

As described in 6.10, in certain circumstances during translation phase 4,

69) An additional category, placemarkers, is used internally in translation phase 4 (see 6.10.3.3); it cannot occur in source files.

6.4.1 Keywords

The above tokens (case sensitive) are reserved (in translation phases 7 and 8) for use as keywords, and shall not be used otherwise.

The keyword _Imaginary is reserved for specifying imaginary types.70)

When preprocessing tokens are converted to tokens during translation phase 7, if a preprocessing token could be converted to either a keyword or an identifier, it is converted to a keyword.

6.4.2.2 Predefined identifiers

This name is encoded as if the implicit declaration had been written in the source

character set and then translated into the execution character set as indicated in translation phase 5.

6.4.5 String literals

In translation phase 6, the multibyte character sequences specified by any sequence of

adjacent character and identically-prefixed string literal tokens are concatenated into a

single multibyte character sequence. If any of the tokens has an encoding prefix, the

resulting multibyte character sequence is treated as having the same prefix; otherwise, it

is treated as a character string literal. Whether differently-prefixed wide string literal

tokens can be concatenated and, if so, the treatment of the resulting multibyte character

sequence are implementation-defined.

6 In translation phase 7, a byte or code of value zero is appended to each multibyte

character sequence that results from a string literal or literals.78)

6.4.8 Preprocessing numbers

A preprocessing number does not have type or a value; it acquires both after a successful

conversion (as part of translation phase 7) to a floating constant token or an integer

constant token.

6.10 Preprocessing directives

Apreprocessing directive consists of a sequence of preprocessing tokens that satisfies the

following constraints: The first token in the sequence is a # preprocessing token that (at

the start of translation phase 4) is either the first character in the source file (optionally

after white space containing no new-line characters) or that follows white space

containing at least one new-line character.

The only white-space characters that shall appear between preprocessing tokens within a

preprocessing directive (from just after the introducing # preprocessing token through

just before the terminating new-line character) are space and horizontal-tab (including

spaces that have replaced comments or possibly other white-space characters in

translation phase 3).

EXAMPLE In:

#define EMPTY

EMPTY # include <file.h>

the sequence of preprocessing tokens on the second line is not a preprocessing directive, because it does not

begin with a # at the start of translation phase 4, even though it will do so after the macro EMPTY has been

replaced.

6.10.1 Conditional inclusion

166) Because the controlling constant expression is evaluated during translation phase 4, all identifiers

either are or are not macro names — there simply are no keywords, enumeration constants, etc.

167) Thus, on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant

0x8000 is signed and positive within a #if expression even though it would be unsigned in

translation phase 7.

6.10.2 Source file inclusion

170) Note that adjacent string literals are not concatenated into a single string literal (see the translation

phases in 5.1.1.2); thus, an expansion that results in two string literals is an invalid directive.

6.10.3 Macro replacement

171) Since, by macro-replacement time, all character constants and string literals are preprocessing tokens,

not sequences possibly containing identifier-like subsequences (see 5.1.1.2, translation phases), they

are never scanned for macro names or parameters.

6.10.3.3 The ## operator

173) Placemarker preprocessing tokens do not appear in the syntax because they are temporary entities that

exist only within translation phase 4.

6.10.3.5 Scope of macro definitions

A macro definition lasts (independent of block structure) until a corresponding #undef

directive is encountered or (if none is encountered) until the end of the preprocessing

translation unit. Macro definitions have no significance after translation phase 4.

6.10.4 Line control

The line number of the current source line is one greater than the number of new-line

characters read or introduced in translation phase 1 (5.1.1.2) while processing the source

file to the current token.

6.10.9 Pragma operator

The resulting sequence of characters is processed through translation phase 3 to produce

preprocessing tokens that are executed as if they were the pp-tokens in a pragma

directive.

J.2 Undefined behavior

A reserved keyword token is used in translation phase 7 or 8 for some purpose other

than as a keyword (6.4.1)

J.3 Implementation-defined behavior

J.3.1 Translation

— Whether each nonempty sequence of white-space characters other than new-line is

retained or replaced by one space character in translation phase 3 (5.1.1.2).

J.3.2 Environment

1 — The mapping between physical source file multibyte characters and the source

character set in translation phase 1 (5.1.1.2).

strict aliasing

投稿日時 : 2015/03/22

小川清

（翻訳）C/C++のStrict Aliasingを理解するまたは - どうして#$@##@^%コンパイラは僕がしたい事をさせてくれない
http://d.hatena.ne.jp/yohhoy/20120220/p1

小川清

2015年3月の記事一覧

C言語　翻訳段階

strict aliasing

メニュー

共著者の一覧

フォロー一覧

2015年3月の記事一覧

C言語 翻訳段階

strict aliasing

C言語　翻訳段階