字符串格式化c++20 std::format()

第一个参数是待格式化的字符串，后续参数是用于填充待格式化字符串中占位符的值。到目前为止，使用format()时的占位符一般都是一对花括号：｛｝。在这些花括号内可以是格式为[index] [:specifier]的字符串。可以省略所有占位符中的index，也可以为所有占位符指定从零开始的索引，以指明应用于此占位符的第二个和后续参数。如果省略index，则format()的第二个和后续的参数传递的值，

Stack Overflow？Tan90

5248人浏览 · 2023-03-26 22:05:03

Stack Overflow？Tan90 · 2023-03-26 22:05:03 发布

字符串格式化c++20 std::format()

直到c++20之前，字符串格式化一般都是通过printf()之类风格函数，或者c++的I/O流完成的。

c风格函数：不推荐，它们不是类型安全的，并且无法扩展支持自定义类型。
c++的I/O流：因为字符串和参数交织在一起，可读性差，且难以翻译成不同语言。

c++20引入了std::format()，用来格式化字符串，它定义在<format>中。在基本上结合了C风格函数和c++的I/O流的所有优点，是一种类型安全且可扩展的机制。

由于目前gcc还不支持format，所以需要手动添加fmt库作为替换（我最近安装了gcc13支持了c++20、23特性，可以不用在导入fmt包了，gcc13安装教程。

（1）从https://fmt.dev/下载fmt库的最新版本解压到你的电脑上

（2）将include/ftm和src目录复制到你的项目目录中，然后将fmt/core.h、fmt/format.h、fmt/format_inl.h和src/format.cc添加到项目中

（3）向你的项目根目录添加如下代码

#include "fmt/format.h"
#include "src/format.cc"

namespace std{
    using fmt::format;
    using fmt::format_error;
    using fmt::formatter;
}

函数的定义

Defined in header <format>
template< class... Args >
std::string format( std::format_string<Args...> fmt, Args&&... args );
(1)	(since C++20)
template< class... Args >
std::wstring format( std::wformat_string<Args...> fmt, Args&&... args );
(2)	(since C++20)
template< class... Args >
std::string format( const std::locale& loc,
                    std::format_string<Args...> fmt, Args&&... args );
(3)	(since C++20)
template< class... Args >
std::wstring format( const std::locale& loc,
                     std::wformat_string<Args...> fmt, Args&&... args );
(4)	(since C++20)

例如：

// 省略了占位符的显示索引
auto s1 {format("read {} bytes from {}", 100, "file.txt")};
// 指定索引
auto s2 {format("read {0} bytes from {1}", 100, "file.txt")};
// 可以更改输出字符串中格式化的值的顺序
auto s3 {format("read {1} bytes from {0}", 100, "file.txt")}; // s3{read file.txt bytes from 100}
// 不允许混合使用手动和自动索引，下列写法是非法的
auto s4 {format("read {0} bytes from {}", 100, "file.txt")};

格式说明符

格式说明符用于控制值在输出中的格式，前缀为冒号。格式说明符的一般形式如下：

[[file]align][sign][#][0][width][.precision][type] // []号里的所有说明符都是可以选择的

width：width指定待格式化的值所占字段的最小宽度。width也可以是另一组花括号，称为动态宽度。如果在花括号中指定了索引，例如｛5｝，则动态宽度width取自给定索引对应的format()的实参。如果未指定索引，例如｛｝，则width取自format()的实参列表中的下一个参数。示例如下：
```
int i{12};
string s{"helloworld"};
cout << format("|{:5}|", i) << endl;
cout << format("|{:{}}|", i, 7) << endl;
cout << format("|{:5}|", s) << endl;
cout << format("|{:{}}|", s, 12) << endl;
```
输出结果：
```
|   12|
|     12|
|helloworld|
|helloworld  |
```
[fill]align：fill表示使用哪个字符作为填充字符，然后align是值在其字段中的对齐方式。
1. '<'表示左对齐（非整数和非浮点数的默认对齐方式）

'>'表示右对齐（整数和浮点数的默认对齐方式）
3. '^'表示居中对齐

int i{12};
string s{"helloworld"};
cout << format("|{:<5}|", i) << endl;
cout << format("|{:{}}|", i, 7) << endl;
cout << format("|{:5}|", s) << endl;
cout << format("|{:>{}}|", s, 12) << endl;
cout << format("|{:-^6}|", i) << endl;

输出结果：

|12   |
|     12|
|helloworld|
|  helloworld|
|--12--|

sign：可以是下列三项之一：

'-'表示只显示负数的符号（默认方式）。
'+'表示显示整数和负数的符号。
space表示对于负数使用负号，对于正数使用空格。

示例如下：

int i{12};
cout << format("|{:<5}|", i) << endl;
cout << format("|{:<+5}|", i) << endl;
cout << format("|{:< 5}|", i) << endl;
cout << format("|{:< 5}|", -i) << endl;

输出结果：

|12   |
|+12  |
| 12  |
|-12  |

#：启用所谓的备用格式规则。如果为整型启用，并且还指定了十六进制、二进制或八进制数字格式，则备用格式会在格式化数字前面插入0x、0X、0b、0B或者0,。如果为浮点类型启用，则备用格式将始终输出十进制分隔符，即使后面没有数字。
type：指定了给定值要被格式化的类型，以下是几个选项。
1. 整型：b(二进制)，B(二进制，当指定#时，使用0B而不是0b)，d(十进制)，o(八进制)，x(小写字母a,b,c,d,e的十六进制)，X（大写字母，A,B,C,D,E的十六进制，当指定#时，使用0X而不是0x）。如果type未指定，整型默认使用d。
2. 浮点型：
  1. e,E：以小写e或者大写E表示指数的科学表示法，按照给定精度或格式化。
  2. f,F：固定表示法，按照给定精度或者格式化。
  3. g,G：以小写e或者大写E表示指数的通用表示法，按照给定精度或者格式化。
  4. a,A：带有小写字母（a)或者大写字母（A)的十六进制表示法。
  5. 如果type未指定，浮点型默认使用g。
  6. 布尔型：s(以文本形式输出true或false)，b,B,c,d,o,x,X(以整型输出1或0)。如果type未指定，布尔型默认使用s。
  7. 字符型：c(输出字符副本)，b,B,d,o,x,X（整数表示）。如果type未指定，字符型默认使用c。
  8. 字符串：s(输出字符串副本)。如果type未指定，字符串默认使用s。
  9. 指针：p(0x为前缀的十六进制表示法)。如果type未指定，指定默认使用p。
```
int i{12};
cout << format("|{:10d}|", i) << endl;
cout << format("|{:10b}|", i) << endl;
cout << format("|{:#10b}|", i) << endl;
cout << format("|{:10X}|", i) << endl;
cout << format("|{:#10X}|", i) << endl;
```
  输出结果：
```
|        12|
|      1100|
|    0b1100|
|         C|
|       0XC|
```

precision：precision只能用于浮点和字符串类型。它的格式为一个点后跟浮点类型要输出的小数位数，或字符串要输出的字符数。就像width一样，这也可以是另一组花括号，在这种情况下，它被称为动态精度。precision取自format()的实参列表中的下一个实参或具有给定索引的实参。

int i{12};
string s{"helloworld"};
cout << format("|{:10d}|", i) << endl;
cout << format("|{:10b}|", i) << endl;
cout << format("|{:#10b}|", i) << endl;
cout << format("|{:10X}|", i) << endl;
cout << format("|{:#10X}|", i) << endl;

double d{3.1415 / 2.3};
cout << format("|{:12g}|", d) << endl;
cout << format("|{:12.3}|", d) << endl; // 12表示字符串最小宽度，.3表示输出的总位数
cout << format("|{:12e}|", d) << endl;

int width{12};
int precision{3};
cout << format("|{2:{0}.{1}f}|", width, precision, d) << endl;

输出结果：

|        12|
|      1100|
|    0b1100|
|         C|
|       0XC|
|     1.36587|
|        1.37|
|1.365870e+00|
|       1.366|

0：0表示，对于数值，将0插入格式化结果中，以达到[width]指定的最小宽度，这些0插在数值的前面，但在符号以及任何0x、0X、0b或者0B前缀之后。

int i{12};
cout << format("|{:06d}|", i) << endl;
cout << format("|{:^06d}|", i) << endl;
cout << format("|{:+06d}|", i) << endl;
cout << format("|{:06X}|", i) << endl;
cout << format("|{:#06X}|", i) << endl;

输出结果：

|000012|
|001200|
|+00012|
|00000C|
|0X000C|

格式说明符错误

如前所述，格式说明符需要遵循严格规则，如果格式说明符包含错误，将抛出std::format_error异常。

try
{
    cout << format("An interger: {:.}", 5) << endl;
}
catch (const format_error &error)
{
    cout << error.what() << endl;
}

输出结果：

missing precision specifier

支持自定义类型

可以扩展c++20格式库以添加对自定义类型的支持。这涉及编写std::formatter类模版的特化版本，该模板包含两个方法模版：parse()和format()。

假设有一个用来存储键值对的类，如下：

class KeyValue
{
public:
    KeyValue(string_view key, int value) : m_key{key}, m_value{value} {}
    const string &getKey() const { return m_key; }
    int getValue() const { return m_value; }

private:
    string m_key;
    int m_value;
};

可以通过编写一下类模版特化来实现KeyValue对象的自定义formatter。此定义还支持自定义格式说明符：｛:a}只输出键，{:b}只输出值，{:c}和{}同时输出键和值。

using namespace std;
namespace std
{
    using fmt::format;
    using fmt::format_error;
    using fmt::formatter;
} // namespace std

class KeyValue
{
public:
    KeyValue(string_view key, int value) : m_key{key}, m_value{value} {}
    const string &getKey() const { return m_key; }
    int getValue() const { return m_value; }

private:
    string m_key;
    int m_value;
};

template <>
class std::formatter<KeyValue>
{
public:
    constexpr auto parse(auto &context) // constexpr让函数在编译期求值
    {
        auto iter{context.begin()};
        const auto end{context.end()};
        if (iter == end || *iter == '}')
        {
            m_outputType = OutputType::KeyAndValue;
            return iter;
        }

        switch (*iter)
        {
        case 'a':
            m_outputType = OutputType::KeyOnly;
            break;
        case 'b':
            m_outputType = OutputType::ValueOnly;
            break;
        case 'c':
            m_outputType = OutputType::KeyAndValue;
            break;
        default:
            throw format_error{"Invalid KeyValue format specifier!"};
        }

        ++iter;
        if (iter != end && *iter != '}')
        {
            throw format_error{"Invalid KeyValue format specifier!!!"};
        }
        return iter;
    }

    auto format(const KeyValue &kv, auto &context)
    {
        using enum OutputType;

        switch (m_outputType)
        {

        case KeyOnly:
            return format_to(context.out(), "{}", kv.getKey());
        case ValueOnly:
            return format_to(context.out(), "{}", kv.getValue());
        default:
            return format_to(context.out(), "{}, {}", kv.getKey(), kv.getValue());
        }
    }

private:
    enum class OutputType
    {
        KeyOnly,
        ValueOnly,
        KeyAndValue
    };
    OutputType m_outputType{OutputType::KeyAndValue};
};

int main()
{
    KeyValue keyValue{"key1", 6};
    cout << format("{}", keyValue) << endl;
    cout << format("{:a}", keyValue) << endl;
    cout << format("{:b}", keyValue) << endl;
    cout << format("{:c}", keyValue) << endl;
    // cout << format("{:d}", keyValue) << endl;
    // cout << format("{:ab}", keyValue) << endl;

    return 0;
}

输出结果：编译的时候要加上 -std=c++20

key1, 6
key1
6
key1, 6

parse()方法负责解析范围[context.begin(), context.end())内的格式说明符。它将解析格式说明符的结果存储在formatter类的数据成员中,并且返回一个迭代器，该迭代器指向解析格式说明符字符串结束后的下一个字符。

format()方法根据parse()解析的格式对应的enum类型，将结果写入context.out()，并且返回一个指向输出末尾的迭代器。上述代码中，通过将输出转发到std::format_to()来执行实际的格式化。format_to()函数接收预先分配的缓冲区作为第一个参数，并将结果字符串写入其中，而format()则创建一个新的字符串对象返回。