> 文章列表 > std::regex正则表达式

std::regex正则表达式

std::regex正则表达式

std::match_results

(匹配的结果存入其中)
result[0]是完整的文本,result[1]是第一个分组匹配的数据。如果正则表达式有n个分组,match_results的size也就是n+1个

This is a specialized allocator-aware container. It can only be default created, obtained from std::regex_iterator, or modified by std::regex_search or std::regex_match. Because std::match_results holds std::sub_matches, each of which is a pair of iterators into the original character sequence that was matched, it’s undefined behavior to examine std::match_results if the original character sequence was destroyed or iterators to it were invalidated for other reasons.

Type Definition
std::cmatch std::match_results<const char*>
std::wcmatch std::match_results<const wchar_t*>
std::smatch std::match_results<std::string::const_iterator>
std::wsmatch std::match_results<std::wstring::const_iterator>
std::pmr::cmatch (C++17) std::pmr::match_results<const char*>
std::pmr::wcmatch (C++17) std::pmr::match_results<const wchar_t*>
std::pmr::smatch (C++17) std::pmr::match_results<std::string::const_iterator>
std::pmr::wsmatch (C++17) std::pmr::match_results<std::wstring::const_iterator>

std::sub_match

用来观测match_results的结果
The class template std::sub_match is used by the regular expression engine to denote sequences of characters matched by marked sub-expressions.

regex_match

Returns true if a match exists, false otherwise.

#include <iostream>
#include <regex>
#include <string>int main()
{// Simple regular expression matchingconst std::string fnames[] = {"foo.txt", "bar.txt", "baz.dat", "zoidberg"};const std::regex txt_regex("[a-z]+\\\\.txt");for (const auto &fname : fnames)std::cout << fname << ": " << std::regex_match(fname, txt_regex) << '\\n';
/*
foo.txt: 1
bar.txt: 1
baz.dat: 0
zoidberg: 0
*/// Extraction of a sub-matchconst std::regex base_regex("([a-z]+)\\\\.txt");std::smatch base_match;for (const auto &fname : fnames){if (std::regex_match(fname, base_match, base_regex)){// The first sub_match is the whole string; the next// sub_match is the first parenthesized expression.if (base_match.size() == 2){std::ssub_match base_sub_match = base_match[1];std::string base = base_sub_match.str();std::cout << fname << " has a base of " << base << '\\n';}}}
/*
foo.txt has a base of foo
bar.txt has a base of bar
*/// Extraction of several sub-matchesconst std::regex pieces_regex("([a-z]+)\\\\.([a-z]+)");std::smatch pieces_match;for (const auto &fname : fnames){if (std::regex_match(fname, pieces_match, pieces_regex)){std::cout << fname << '\\n';for (size_t i = 0; i < pieces_match.size(); ++i){std::ssub_match sub_match = pieces_match[i];std::string piece = sub_match.str();std::cout << "  submatch " << i << ": " << piece << '\\n';}}}
}
/*
foo.txtsubmatch 0: foo.txtsubmatch 1: foosubmatch 2: txt
bar.txtsubmatch 0: bar.txtsubmatch 1: barsubmatch 2: txt
baz.datsubmatch 0: baz.datsubmatch 1: bazsubmatch 2: dat
*/

regex_search

std::regex_search: 搜素正则表达式参数,但它不要求整个字符序列完全匹配。而且它只进行单次搜索,搜索到即停止继续搜索,不进行重复多次搜索。
Determines if there is a match between the regular expression e and some subsequence in the target character sequence.
1- Analyzes generic range [first, last). Match results are returned in m.
2- Analyzes a null-terminated string pointed to by str. Match results are returned in m.
3- Analyzes a string s. Match results are returned in m.
4-6- Equivalent to (1-3), just omits the match results.
7- The overload (3) is prohibited from accepting temporary strings, otherwise this function populates match_results m with string iterators that become invalid immediately.

regex_search will successfully match any subsequence of the given sequence, whereas std::regex_match will only return true if the regular expression matches the entire sequence.

#include <iostream>
#include <regex>
#include <string>int main()
{std::string lines[] = {"Roses are #ff0000","violets are #0000ff","all of my base are belong to you"};std::regex color_regex("#([a-f0-9]{2})""([a-f0-9]{2})""([a-f0-9]{2})");// simple matchfor (const auto &line : lines) {std::cout << line << ": " << std::boolalpha<< std::regex_search(line, color_regex) << '\\n';}   std::cout << '\\n';// show contents of marked subexpressions within each matchstd::smatch color_match;for (const auto& line : lines) {if(std::regex_search(line, color_match, color_regex)) {std::cout << "matches for '" << line << "'\\n";std::cout << "Prefix: '" << color_match.prefix() << "'\\n";for (size_t i = 0; i < color_match.size(); ++i) std::cout << i << ": " << color_match[i] << '\\n';std::cout << "Suffix: '" << color_match.suffix() << "\\'\\n\\n";}}// repeated search (see also std::regex_iterator)std::string log(R"(Speed:	366Mass:	35Speed:	378Mass:	32Speed:	400Mass:	30)");std::regex r(R"(Speed:\\t\\d*)");std::smatch sm;while(regex_search(log, sm, r)){std::cout << sm.str() << '\\n';log = sm.suffix();}// C-style string demostd::cmatch cm;if(std::regex_search("this is a test", cm, std::regex("test"))) std::cout << "\\nFound " << cm[0] << " at position " << cm.prefix().length();
}

std::regex_replace

  1. Copies characters in the range [first, last) to out, replacing any sequences that match re with characters formatted by fmt. In other words:
    Constructs a std::regex_iterator object i as if by std::regex_iterator<BidirIt, CharT, traits> i(first, last, re, flags), and uses it to step through every match of re within the sequence [first,last).
    For each such match m, copies the non-matched subsequence (m.prefix()) into out as if by out = std::copy(m.prefix().first, m.prefix().second, out) and then replaces the matched subsequence with the formatted replacement string as if by calling out = m.format(out, fmt, flags).
    When no more matches are found, copies the remaining non-matched characters to out as if by out = std::copy(last_m.suffix().first, last_m.suffix().second, out) where last_m is a copy of the last match found.
    If there are no matches, copies the entire sequence into out as-is, by out = std::copy(first, last, out)
    If flags contains std::regex_constants::format_no_copy, the non-matched subsequences are not copied into out.
    If flags contains std::regex_constants::format_first_only, only the first match is replaced.
  2. same as 1), but the formatted replacement is performed as if by calling out = m.format(out, fmt, fmt + char_traits::length(fmt), flags)

3-4) Constructs an empty string result of type std::basic_string<CharT, ST, SA> and calls std::regex_replace(std::back_inserter(result), s.begin(), s.end(), re, fmt, flags).
5-6) Constructs an empty string result of type std::basic_string and calls std::regex_replace(std::back_inserter(result), s, s + std::char_traits::length(s), re, fmt, flags)

Return value
1-2) Returns a copy of the output iterator out after all the insertions.
3-6) Returns the string result which contains the output.

#include <iostream>
#include <iterator>
#include <regex>
#include <string>int main()
{std::string text = "Quick brown fox";std::regex vowel_re("a|e|i|o|u");// write the results to an output iteratorstd::regex_replace(std::ostreambuf_iterator<char>(std::cout),text.begin(), text.end(), vowel_re, "*");// construct a string holding the resultsstd::cout << '\\n' << std::regex_replace(text, vowel_re, "[$&]") << '\\n';
}

std::regex_iterator

It is the programmer’s responsibility to ensure that the std::basic_regex object passed to the iterator’s constructor outlives the iterator. Because the iterator stores a pointer to the regex, incrementing the iterator after the regex was destroyed accesses a dangling pointer.
If the part of the regular expression that matched is just an assertion (^, $, \\b, \\B), the match stored in the iterator is a zero-length match, that is, match[0].first == match[0].second.


#include <iostream>
#include <iterator>
#include <regex>
#include <string>int main()
{const std::string s = "Quick brown fox.";std::regex words_regex("[^\\\\s]+");auto words_begin =std::sregex_iterator(s.begin(), s.end(), words_regex);auto words_end = std::sregex_iterator();std::cout << "Found "<< std::distance(words_begin, words_end)<< " words:\\n";for (std::sregex_iterator i = words_begin; i != words_end; ++i){std::smatch match = *i;std::string match_str = match.str();std::cout << match_str << '\\n';}
}