AshunBlog

正则表达式

基础知识

正则表达式是用于匹配字符串中字符组合的模式，在 JavaScript中，正则表达式也是对象。

正则表达式是在宿主环境下运行的，如js/php/node.js 等 , 即在不同的语言中都会有正则表达式。
本章讲解的知识在其他语言中知识也是可用的，但是会有些函数使用上的区别

对比分析

与普通函数操作字符串来比较，正则表达式可以写出更简洁、功能强大的代码。

下面使用获取字符串中的所有数字来比较函数与正则的差异。

1
2
3

let Str = "Ashuntefannao1.1324Ashun";
let numbers = [...Str].filter((char) => !isNaN(Number(char))).join("");
console.log(numbers);

使用正则表达式将简单得多

1
2
3

let Str = "Ashuntefannao1.1324Ashun";
let numbers = Str.match(/\d/g).join("");
console.log(numbers);

创建正则

JS提供字面量与对象两种方式创建正则表达式，推荐使用字面量形式创建RegExp，形式更加简洁。

字面量创建

使用//包裹的字面量创建方式是推荐的作法，但不能在//中使用变量作为匹配规则

let as = "Ashuntefannao.com";
let reg=/[A-Z]/;
console.log(reg.test(as)); //true
console.log(/[A-Z]/.test(as)); //true

//包裹的字面量创建的正则，其中不能应用变量，作为匹配规则。

//包裹的内容会被理解为正则语法，所以其中的z不是变量，是匹配z字符

let as = "Ashuntefannao.com";
let z = "shun";
console.log(/z/.test(as)); //false
console.log(/z/.test("zheng"))//true

虽然可以使用 eval 计算字符串表达式，将其转换为js语法来实现将变量解析到正则中，但是比较麻烦，所以有变量时建议使用下面所讲的对象创建方式

1
2
3

let as = "Ashuntefannao.com";
let z = "shun";
console.log(eval(`/${z}/`).test(as)); //true

对象创建

当正则需要动态创建时（应用str变量），建议使用对象方式

let as = "ashuntefannao";
let str = "ashun";
let reg = new RegExp(str);
console.log(reg.test(as)); //true

根据用户输入高亮显示内容，支持用户输入正则表达式

<body>
  <div id="content">ashuntefannao</div>
</body>
<script>
	let matchStr = prompt("请输入搜索的字符串,支持输入正则");
  let reg = new RegExp(matchStr, "g");
  let dom = document.querySelector("#content");
  dom.innerHTML = dom.innerHTML.replace(reg, (str) => {
  	console.log(str);
  	return `<strong style="color:red;">${str}</strong>`;
  });
</script>

通过对象创建正则，提取标签

<body>
 		<h1>ashuntefannao</h1>
    <h1>ASHUN</h1>
</body>
<script>
{
        function getElement(ele) {
          let html = document.body.innerHTML;
          let matchStr = `<(${ele})>.+</\\1>`;
          let reg = new RegExp(matchStr, "g");

          console.table(html.match(reg));
        }
        getElement("h1");
}
</script>

选择符

选择修释符| 这个符号带表或的关系，也就是 | 左右两侧有一个匹配到就可以。

要注意：使用的范围不同，得到的结果也不同

如果在整个表达式使用，则将整个表达式一分为二
同理，在原子组中使用，则只是将当前原子组一分为二

检测电话是否是上海或北京的坐机

let tel = "010-12345678";
//错误结果：只匹配 | 左右两边任一结果
console.log(tel.match(/010|020\-\d{7,8}/)); 

//正确结果：需要放在原子组中使用，将当前原子组一分为二
console.log(tel.match(/(010|020)\-\d{7,8}/));

匹配字符是否包含ashuntefannao 或 ashun

1 2	const title = "ashuntefannao"; console.log(/ashuntefannao\|ashun/.test(title)); //true

字符转义

\被转义字符

转义用于 改变字符的含义，用来处理 某个字符有多种语义时 的情况。
在正则中具有特殊意义的字符，使用时需要转义。
假如有这样的场景，如果我们想通过正则查找/符号，但是 /在正则中有特殊的意义。如果写成///这会造成解析错误，所以要使用转义语法 /\//来匹配。

1 2	const url = "https://www.ashuntefannao.com"; console.log(/https:\/\//.test(url)); //true

使用 RegExp 构建正则时在转义上会有些区别，下面是对象与字面量定义正则时区别

在普通字符串中，转义一次字符，结果不变
由于使用对象形式new RegExp(str,mode)创建正则时，第一个参数接收的是字符串，需要在字符串中定义匹配的规则，所以若使用特殊意义的匹配字符需要转义两次。

let price = 12.23;
//含义1: . 除换行外任何字符 	含义2: .普通点
//含义1: d 字母d   					含义2: \d 数字 0~9
console.log(/\d+\.\d+/.test(price));

//字符串中 \d 与 d 是一样的，所以在 new RegExp 时\d 即为 d
console.log("\d" == "d");

//使用对象定义正则时，可以先把字符串打印一样，结果是字面量一样的定义就对了
console.log("\\d+\\.\\d+");
let reg = new RegExp("\\d+\\.\\d+");
console.log(reg.test(price));

下面是网址检测中转义符使用

1 2	let url = "https://www.ashuntefannao.com"; console.log(url.match(/https?:\/\/w+\.\w+\.\w+/));

字符边界

使用字符边界符用于控制匹配内容的开始与结束约定。

边界符	说明
^	匹配字符串的开始
$	匹配字符串的结束，忽略换行符

匹配内容必须以www开始

1 2	const as = "www.ashuntefannao.com"; console.log(/^www/.test(as)); //true

匹配内容必须以.com结束

1 2	const as = "www.ashuntefannao.com"; console.log(/\.com$/.test(as)); //true

检测用户名长度为3~6位，且只能为字母。如果不使用 ^与$ 限制将得不到正确结果

<body>
  <input type="text" name="username" />
</body>

<script>
  	document.querySelector("[name='username']")
          .addEventListener("input", function () {
            let test = this.value.match(/^[a-zA-Z]{3,6}$/);
            console.log(test ? "正确" : "错误");
          });
</script>

let reg = /[a-zA-Z]{3,6}/;
    let reg1 = /^[a-zA-Z]{3,6}$/;
    
    //不使用^$限制，虽然字符串长度不足时也可返回正确结果
    //但字符串长度超出时，返回的是自前向后匹配的结果
    console.log("as".match(reg));							//null
    console.log("123ashuntefannao123".match(reg));	//["ashunt"]

//使用^$进行限制，以字母3-6位开头，并以字母3-6位结尾，则匹配结果为3-6位纯字母
    console.log("ashuntefannao".match(reg1)); //null
    console.log("as".match(reg1)); 						//null
    console.log("ashun".match(reg1)); 				//["ashun"]

//只是用^限制，则只要开头满足3-6位纯字母即可
    console.log("ashuntefannao123".match(/^[a-zA-Z]{3,6}/)); //["ashunt"]
    //只是用$限制，则只要结尾满足3-6位纯字母即可
    console.log("132ashuntefannao".match(/[a-zA-Z]{3,6}$/)); //["fannao"]

元字符

元字符是正则表达式中的最小元素，只代表单一（一个）字符。

普通元字符

普通元字符，即在正则中没有特殊意义的普通字符。

1
2
3

let a = "ashun";
console.log(/a/.test(a)); 		//true
console.log(a.match(/a/)[0]); //a

特殊元字符

元字符	说明	示例
\d	匹配任意一个数字	[0-9]
\D	与除了数字以外的任何一个字符匹配	[^0-9]
\w	与任意一个字母、数字、下划线匹配	[a-zA-Z0-9_]
\W	除了字母、数字、下划线以外与任何字符匹配	[^a-zA-Z0-9_]
\s	任意一个空白字符匹配，如空格，制表符`\t`，换行符`\n`	[\n\f\r\t\v]
\S	除了空白符外任意一个字符匹配	[^\n\f\r\t\v]
\p{prop}	配合`u`模式，匹配对应属性的字符	( \p{P} \| \p{sc=Han} )
.	匹配除换行符外的任意字符	[^\n]

使用体验

匹配任意数字

1 2	let as = "Ashuntefannao 2020"; console.log(as.match(/\d/g)); //["2", "0", "2", "0"]

匹配所有电话号码

let as = `
	张三:010-99999999,李四:020-88888888
`;

let res = as.match(/\d{3}-\d{7,8}/g);
console.log(res);

获取所有用户名

-在原子组[]中具有特殊含义，为了避免冲突，最好转义使用
下列情况是否转义，情况都相同，但在表示一些匹配范围时，就会有歧义

let as = `
张三:010-99999999,李四:020-88888888`;
let res = as.match(/[^:\d-,\s]+/g);

console.log(/[^:\d\-,\s]+/g)
console.log(res);

1 2	console.log("103azA".match(/[0\-9]/g)); //["0"] console.log("103azA".match(/[0-9]/g)); //["1","0","3"]

也可使用\p{prop}结合u模式，匹配属性为汉字的字符

1	console.log(as.match(/\p{sc=Han}{2,3}/gu)); //["张三","李四"]

匹配任意非数字

1	console.log(/\D/.test(2029)); //false

匹配字母数字下划线

1 2	let as = "ashun@"; console.log(as.match(/\w/g)); //["a", "s", "h", "u", "n"]

匹配除了字母,数字或下划线外与任何字符匹配

1	console.log(/\W/.test("@")); //true

匹配与任意一个空白字符

1 2	console.log(/\s/.test(" ")); //true console.log(/\s/.test("\n \t")); //true

匹配除了空白符外任意一个字符匹配

1 2	let as = "ashun@"; console.log(as.match(/\S/g)); //["a", "s", "h", "u", "n","@"]

如果要匹配字符串点"." 则需要转义

1
2
3

let as = `ashuntefannao@com`;
console.log(/ashuntefannao.com/i.test(as));  //true
console.log(/ashuntefannao\.com/i.test(as)); //false

使用.匹配除换行符外任意字符，下面匹配不到ashun.com 因为有换行符且没有使用g模式

const url = `
  https://www.Ashuntefanano.com
  ashun.com
`;
console.log(url.match(/.+/));

特殊元字符. 配合s模式，.就可以匹配任意字符 (能够匹配换行符)

使用s单行模式（忽略换行）时，可以匹配所有

let title = `
  <span>
    ashuntefannao
    ashun
  </span>
`;
let res = title.match(/<span>.*<\/span>/s);
console.log(res[0]);

正则中会将空格按普通字符对待

1
2
3

let tel = `010 - 999999`;
console.log(/\d+-\d+/.test(tel)); //false
console.log(/\d+ - \d+/.test(tel)); //true

所有字符

可以使用 [\s\S]、[\d\D]、[\w\W]等等来匹配所有字符

let as = `
  <span>
    ashuntefannao
    ashun
  </span>
`;
let res = as.match(/<span>[\s\S]+<\/span>/);
console.log(res[0]);

.结合s单行模式(忽略换行符)，也可匹配所有字符

1
2
3

……
let res = title.match(/<span>.*<\/span>/s);
console.log(res[0]);

匹配模式

正则表达式在执行时会按他们的默认执行方式进行处理，但有时候默认的处理方式总不能满足我们的需求，所以可以切换不同的匹配模式满足业务需求。

修饰符	说明
i	不区分大小写字母的匹配
g	全局搜索所有匹配内容
m	视为多行
s	视为单行忽略换行符，使用`.` 可以匹配所有字符
y	从 `regexp.lastIndex` 开始匹配
u	Unicode模式，能够正确处理四个字符的 UTF-16 编码

i

将所有ashuntefannao 统一为小写

1
2
3

let site = "ashuntefannao ASHUNTEFANNAO";
site = site.replace(/ashuntefannao/gi, "ashuntefannao");
console.log(site);

g

使用 g 模式，可以搜索到所有满足匹配规则的string

 {
        let as = "ashuntefannao";
        as = as.replace(/n/, "@");
        console.log(as); //没有使用 g 修饰符是，只替换了第一个 n
}
{
        let as = "ashuntefannao";
        as = as.replace(/n/g, "@");
        console.log(as); //使用全局修饰符后替换了全部的 n
}

m

m多行匹配模式，结合^$可约束匹配字符串中的每一行。

也就是说：字符串存在换行符\n时，字符串有几行，就匹配多少次。并且每一行都可使用^$进行约束。

默认情况下，无论字符串是否有换行符，都只有一个开头^和结尾$
结合m模式，每一行都有自身的^$

{
      let str = `
      # jsES6 #
      # vue2.0 #
      # ashunte #
      `;
      console.log(str.match(/^\s*#\s+.*\s#$/gm));	
      console.log(str.match(/^\s*#\s+.+#\s*$/g));//null
}

下例是将以 #数字开始的课程解析为对象结构，学习过后面讲到的原子组可以让代码简单些

{
  			let str = `
  				#1 js,200元 #
  				#2 php,300元 #
  				#9 ashuntefannao # 阿顺特烦恼
  				#3 node.js,180元 #
				`;
        let lessons = str.match(/^\s*#.+#$/gm);
        console.log(lessons);
        lessons = lessons
          .map((v) => v.replace(/\s*#\d\s*/, "").replace(/\s*#\s*/, ""))
          .map((v) => {
            let [name, price] = v.split(",");
            return { name, price };
          });
        console.log(JSON.stringify(lessons, null, 2));
}

s

s单行模式，在匹配时忽略换行符，结合.能够匹配所有字符

let str = `
      ashuntefannao.com
      999999999
      !@#$%^&*()_+-
`;
console.log(str.match(/.+/gs));

u

每个字符都有属性，如L属性表示是字母，P 表示标点符号
其他属性简写可以访问属性的别名网站查看。
在正则中，使用\p{prop}来匹配对应属性的字符
\p{prop}需要结合 u 模式才有效。

字符也有unicode文字系统属性 Script=文字系统，下面是使用 \p{sc=Han} 获取中文字符 han为中文系统，其他语言请查看文字语言表

 let user = `
        阿顺-157******01
        张三-132******88
 `;
let reg = /\p{sc=Han}+/gu;
console.log(user.match(reg));

使用 u 模式可以正确处理四个字符的 UTF-16 字节编码

let str = "𝒳𝒴";
console.table(str.match(/[𝒳𝒴]/)); //结果为乱字符"�"

console.table(str.match(/[𝒳𝒴]/u)); //结果正确 "𝒳"

lastIndex

reg.lastIndex 可以返回或者设置正则表达式开始匹配的位置

必须结合 g或y 模式使用
只对 exec 方法有效
使用 exec 匹配完所有结果时，lastIndex 会被重置为0

 {
        let text = `阿顺特烦恼希望大家没有烦恼,阿顺不喜欢烦恼`;
        let reg = /阿顺(.{2})/g;
        reg.lastIndex = 10; //从索引10开始搜索
        console.log(reg.exec(text));
        console.log(reg.lastIndex); //上次匹配的结束位置+1=下次匹配的开始位置:18

        reg = /\p{sc=Han}/gu;
        while ((res = reg.exec(text))) {
          console.log(res[0]);
        }
        console.log(reg.lastIndex);	//匹配完所有结果，lastIndex重置为0
}

reg.lastIndex只对reg.exec有效，对str.match()毫无影响

{
      let str = "ashuntefannao";
      let reg = /a/g;
      reg.lastIndex = 11;
      console.log(str.match(reg)); //["a","a","a"]
}

y

我们来对比 y 与g 模式，在结合lastIndex时的使用差异。

使用 g 模式会全局匹配，从头查到尾，一直匹配字符串

{
        let str = "ashuntefannao";
        let reg = /a/g;
        console.log(reg.exec(str));
        console.log(reg.lastIndex); //1
        console.log(reg.exec(str));
        console.log(reg.lastIndex); //9
        console.log(reg.exec(str));
        console.log(reg.lastIndex); //12
        console.log(reg.exec(str)); //null
        console.log(reg.lastIndex); //0
}

但使用y 模式后如果从 lastIndex 开始匹配不成功就不继续匹配了，匹配结束lastIndex也会置为0

let str = "fannao";
let reg = /a/y;
reg.lastIndex = 1;
console.log(reg.exec(str));
console.log(reg.lastIndex); //2
console.log(reg.exec(str)); //null
console.log(reg.lastIndex); //0

因为使用 y 模式可以在lastIndex匹配不到时停止匹配，起到及时止损的作用，在匹配下面字符中的qq时可以提高匹配效率

如果提前知道匹配目标 是连续在一起的 那么使用y模式会更加的高效
但由于需要结合lastIndex,所以只能够使用reg.exec结合循环匹配。

{
        let title = `阿顺特烦恼QQ群:11111111,999999999,88888888
        阿顺特烦恼希望大家没有烦恼,阿顺不喜欢烦恼`;

        let reg = /(\d+),?/y;
        reg.lastIndex = 9;
        while ((res = reg.exec(title))) console.log(res[1]);
}

原子表

在一组字符中匹配某个元字符，在正则表达式中通过元字符表来完成，就是放到[] (方括号)中。

使用语法

原子表	说明
[ … ]	只匹配其中的一个元字符
[^]	匹配`除了`其中所有字符的任意一个元字符
[0-9]	匹配0-9任何一个数字
[a-z]	匹配小写a-z任何一个字母
[A-Z]	匹配大写A-Z任何一个字母

实例操作

使用[]匹配其中任意字符即成功，下例中匹配Af其中一个字符，而不会当成一个整体来对待

1
2
3

const title = "Ashuntefannao";
console.log(/Af/.test(title)); //false
console.log(/[Af]/.test(title)); //true

日期的匹配

1 2	let time = "2022-02-23"; console.log(time.match(/\d{4}([-\/])\d{2}\1\d{2}/));

获取0~3间的任意数字

1 2	const num = "2"; console.log(/[0-3]/.test(num)); //true

匹配a~f间的任意字符

1 2	const char = "e"; console.log(/[a-f]/.test(char)); //true

顺序必须为升序，否则将报错

1 2	const num = "2"; console.log(/[3-0]/.test(num)); //SyntaxError

1 2	const as = "ashuntefannao"; console.log(/[f-a]/.test(as)); //SyntaxError

排除法 ^ 获取所有用户名

let user = `
张三:010-99999999,李四:020-88888888`;
let res = user.match(/[^\s:\d\-,]+/g);
console.log(res);

可以使用 [\s\S] 或 [\d\D]等等，匹配到所有字符包括换行符

1
2
3

...
const reg = /[\s\S]+/g;
...

下面是使用原子表知识删除所有标题

<body>
 		<div id="content">
 			ashuntefannao
 			<h1>ashun</h1>
 		</div>
    <h1>ashuntefannao</h1>
    <h1>ashuntefannao</h1>
    <h2>ASHUN</h2>
    <input type="text" name="username" />
</body>
<script>
        let html = document.body;
        console.log(html.innerHTML.match(/^\s*<(h[0-6])>.*<\/\1>\s*$/gm));

        html.innerHTML = html.innerHTML.replace(
          /^\s*<(h[0-6])>.*<\/\1>\s*$/gm,
          ""
        );
</script>

不需要转义的字符

有些字符在正则中具有特殊含义，但是在原子组中却不用转义，只视为普通字符。

| , 原子表中的|只是普通字符，不是选择符

1 2	console.log("ASHUN\|as".match(/[a\|A]/g)); //["A", "\|", "a"] console.log("ASHUN\|as".match(/(a\|A)/g)); //["A", "a"]

. 在原子表中也没有特殊含义，只是普通的字符"."。
+ 在原子表中也只是普通字符"+"

1
2
3

let site = "Ashuntefannao.com";
console.log(site.match(/[\w+.\w+]/g).join(""));
console.log(site.match(/[\w.]/g).join(""));

若想使用这些特殊含义的字符，可以使用原子组()结合选择符|，让匹配形成或的关系

1
2
3

let site = "Ashuntefannao.com";
console.log(site.match(/(\w+|\.)/g));
console.log(site.match(/(.+)/g));

原子组

如果一次要匹配多个元子，可以通过元子组完成
原子组与原子表的差别在于原子组一次匹配多个元子，而原子表则是匹配任意一个字符，而且原子组配合其它的方法，能够完成给更多的功能
元字符组用 () 包裹

下面使用原子组匹配 h1 标签，如果想匹配 h2 只需要把前面原子组改为 h2 即可。

匹配结束标签时的 \1 意为：应用第一个原子组匹配到的内容

1 2	const dom = `<h1>阿顺特烦恼</h1>`; console.log(/<(h1)>.+<\/\1>/.test(hd)); //true

基本使用

没有使用 g 模式时，自前向后只匹配到第一个，使用str.match\reg.exec方法匹配到的信息包含以下数据

属性	说明
0	匹配到的完整内容
1,2….	匹配到的各个原子组内容
index	匹配到的str在原字符串中的位置
input	原字符串
groups	命名分组

{
      let title = "<h3>Ashun阿顺</h3>";
      let reg = /<(h[0-9])>(.+)<\/\1>/;
      console.log(title.match(reg));
      console.log(reg.exec(title));
}

1
2
3

let as = "Ashuntefannao.com";
console.log(as.match(/fan(nao)\.(com)/));
//["fannao.com", "nao", "com", index: 7, input: "Ashuntefannao.com", groups: undefined]

下面使用原子组匹配标题元素

{
        let content = `
        <h1>阿顺特烦恼</h1>
        <span>阿顺</span>
        <h2>Ashun</h2>
      `;

        console.table(content.match(/<(h[1-6])[\s\S]*<\/\1>/g));
}

上面代码，在匹配标签对内容时，匹配的是任意字符，但是这样在同级别标题连续时，就会出错。

会将<h1>阿顺特烦恼</h1> \n <h1>ashun</h1>视为<h1>阿顺……ashun</h1>，两个标签的匹配结果只有一项

应该排除换行符。

{
        let content = `
        <h1>阿顺特烦恼</h1>
        <h1>Ashuntefannao</h1>
        <span>阿顺</span>
        <h2>Ashun</h2>
      `;

        console.table(content.match(/<(h[1-6])[\s\S]*<\/\1>/g));
        
        //标签内容应该排除换行符
        console.table(content.match(/<(h[1-6]).*<\/\1>/g));
        console.table(content.match(/^\s*<(h[1-6]).*<\/\1>\s*$/gm));
}

检测 0~100 的数值，使用 parseInt 将数值转为10进制

1	console.log(/^(\d{1,2}\|100)$/.test(parseInt(09, 10)));

邮箱匹配

下面使用原子组匹配邮箱

1
2
3

let email = "2300071698@qq.com";
let reg = /^[\w\-]+@[\w\-]+\.(com|org|cn|cc|net)$/i;
console.dir(email.match(reg));

如果邮箱是以下格式 Ashuntefannao@as.com.cn 上面规则将无效，需要定义以下方式

将xxx.这样的后缀，封装为分子组，匹配一个或多个

1
2
3

let email = `admin@Ashun.com.cn`;
let reg = /^[\w-]+@([\w-]+\.)+(org|com|cc|cn)$/;
console.log(email.match(reg));

引用原子组

\number 在匹配时引用第number个原子组匹配的数据
$number 在替换时引用第number个原子组匹配的数据。

下面将标题标签替换为 p标签

let content = `
  <h1>Ashuntefannao</h1>
  <span>阿顺</span>
  <h2>SHUN</h2>
`;

let reg = /<(h[1-6])>(.*)<\/\1>/gi;
console.log(content.replace(reg, `<p>$2</p>`));

? :

如果希望原子组只参与匹配，不返回到结果当中，可使用 ?: 处理。:?置于原子组头部。

下面是获取所有域名的示例

由于下面第二层嵌套的原子组都使用?:处理，则在返回结果中，便没有了属性2~n，但有属性1返回第一层原子组匹配到的域名部分

let webs = `
  https://www.Ashuntefannao.com
  http://Ashunwang.com
  https://Ashun.com
`;

let reg = /https?:\/\/((?:\w+\.)?\w+\.(?:com|org|cn))/gi;
while ((v = reg.exec(webs))) {
  console.dir(v);
}

别名?<>

默认情况下，在不使用g模式时，并结合str.match/reg.exec方法时，各原子组匹配的结果将会存储到1~n属性中，但如果原子组过多，也就代表RegExp越复杂，就越不容易找到对应的原子组匹配结果。

如果希望返回的组数据更清晰，可以为原子组命名，结果将保存在返回的 groups字段中

语法?<alia>,将其置于原子组头部即可

1 2	let title = "<h1>阿顺特烦恼</h1>"; console.dir(title.match(/<(?<tag>h[1-6])(?<content>.*)<\/\1>/));

引用别名原子组

别名不能在参与匹配时引用，若想引用对应原子组，还是通过\number进行引用

\number，参与匹配时引用
$<alia>，参与替换时使用

组别名使用 ?<> 形式定义，下面将标签替换为p标签

let txt = `
  <h1>Ashuntefannao</h1>
  <span>阿顺</span>
  <h2>SHUN</h2>
`;

let reg = /<(?<tag>h[1-6])>(?<content>[\s\S]*)<\/\1>/gi;
console.log(txt.replace(reg, `<p>$<content></p>`));

获取链接与网站名称组成数组集合

<body>
  <a href="https://astfn.github.io">AshunBlog</a>
  <a href="https://www.hdcms.com">hdcms</a>
  <a href="https://www.sina.com.cn">新浪</a>
</body>

<script>
   let body = document.body;
   let reg = /^\s*<a.+href=.*(?<link>https?:\/\/([-\w]+\.)+(?:cn|com|org|cc|github|io)).*>(?<name>.*)<\/a>\s*$/gim;
    let arr = [];
    while ((exec = reg.exec(body.innerHTML))) {
      arr.push(exec["groups"]);
    }
    console.log(arr);
</script>

重复匹配

基本使用

如果要重复匹配一些内容时我们要使用重复匹配修饰符，包括以下几种。

符号	说明
*	重复零次或更多次
+	重复一次或更多次
?	重复零次或一次
{n}	重复n次
{n,}	重复n次或更多次
{n,m}	重复n到m次

重复匹配可以应用于:

单个字符
原子表
原子组

因为正则最小单位是元字符，而我们很少只匹配一个元字符如a、b所以基本上重复匹配在每条正则语句中都是必用到的内容。

默认情况下,重复选项对单个字符进行重复匹配，是贪婪匹配，一只匹配到不重复为止。

1 2	let as = "assshun"; console.log(as.match(/as+/i)); //asss

使用原子组后则对整个组重复匹配

1 2	let as = "ashunAshuntefannao"; console.log(as.match(/(ashun)+/i)); //ashunAshun

下面是验证坐机号的正则

1 2	let tell = "010-12345678"; console.log(/^0\d{2,3}-\d{7,8}$/.exec(tell));

验证用户名只能为3~8位的字母或数字，并以字母开始

像这种严格约束的正则，需要用^…$进行约束，若不使用其进行约束，那么如果用户名符合字母\数字，但是位数超过8位，也能匹配成功，只不过只返回前8位。

<body>
  <input type="text" name="username" />
</body>
<script>
  {
        let input = document.querySelector("[name='username']");
        let reg = /^[a-zA-Z](\d|[a-zA-Z]){2,7}$/;
        input.addEventListener("input", function () {
          console.log(reg.test(this.value) ? "正确" : "错误");
        });
   }
</script>

验证密码必须包含大写字母并在5~10位之间

多种验证组合，可以将regexp置于数组中，遍历数组过程中对str进行判断

<body>
<input type="text" name="password" />
</body>
<script>
let input = document.querySelector(`[name="password"]`);
input.addEventListener("keyup", e => {
  const value = e.target.value.trim();
  const regs = [/^[a-zA-Z0-9]{5,10}$/, /[A-Z]/];
  let state = regs.every(v => v.test(value));
  console.log(state ? "正确！" : "密码必须包含大写字母并在5~10位之间");
});
</script>

禁止贪婪

正则表达式在进行重复匹配时，默认是贪婪匹配模式，也就是说会尽量匹配更多内容，但是有的时候我们并不希望他匹配更多内容，这时可以通过 ? 对重复匹配语法进行修饰，会尽可能的少匹配。

使用	说明
*?	0个或多个，但尽可能少重复
+?	1个或多个，但尽可能少重复
??	0个或1个，但尽可能少重复
{n,m}?	n~m个，但尽可能少重复
{n,}?	>=n个，但尽可能少重复

下面是禁止贪婪的语法例子

{
     let str = "assshun";
     // let reg = /as+?/;      //as
     // let reg = /as*?/;      //a
     // let reg = /as??/;      //a
     // let reg = /as{2,}?/;   //ass
     // let reg = /as{2,3}?/;  //ass

     console.log(str.match(reg));
}

将所有span更换为h5 并描红，并在内容前加上 阿顺-

之前我们提到过，在匹配标签时，标签对的内容要忽略换行符，为了避免同名标签连续时，合并匹配的情况
其实我们也可使用禁止贪婪，来进行约束，并且禁止贪婪情况下，也就不必使用m模式配合^$进行严格约束了。

<body>
  <main>
    <span>ashunwang</span>
    <span>ashun.com</span>
    <span>ashuntefannao.com</span>
  </main>
</body>
<script>
  const main = document.querySelector("main");
  const reg = /<span>([\s\S]+?)<\/span>/gi;
  main.innerHTML = main.innerHTML.replace(reg, (v, p1) => {
    console.log(p1);
    return `<h4 style="color:red">后盾人-${p1}</h4>`;
  });
</script>

下面是使用禁止贪婪查找页面中的标题元素

<body>
 		<div id="content">
      ashuntefannao
      <h1>ashun</h1>
    </div>
    <h1>ashuntefannao</h1>
    <h1>ashuntefannao</h1>
    <h2>ASHUN</h2>
</body>

<script>
{
        let body = document.body.innerHTML;
        let reg = /<(h[1-6])>[\s\S]*?<\/\1>/gi;
        // let reg = /^\s*?<(h[1-6])>[\s\S]*?<\/\1>\s*?$/gim;
        console.table(body.match(reg));
}
</script>

细节返回

问题分析

通过之前的使用，我们知道str.match方法：

在非g全局匹配模式下，只返回一个匹配结果，并且包含匹配细节参数
在g模式下，返回一个包含所有匹配内容的Array，但是每个元素不包含匹配细节

<body>
  <h1>Ashuntefannao.com</h1>
  <h2>ashun.com</h2>
  <h1>阿顺特烦恼</h1>
</body>

<script>
  function elem(tag) {
    const reg = new RegExp(`<(${tag}).*>.+?</\\1>`, "g");
    return document.body.innerHTML.match(reg);
  }
  console.table(elem("h1"));
</script>

matchAll

在新版本浏览器中支持使用 matchAll 操作，并返回迭代对象

需要添加 g 修饰符

{
     let str = "as";
     let reg = /./g;
     let iterator = str.matchAll(reg);
     console.log(iterator.next()); //{ value:Array(1), done:false }
     console.log(iterator.next()); //{ value:Array(1), done:false }
     console.log(iterator.next()); //{ value:undefined, done:true }
 }

可迭代对象可使用for/of遍历

let str = "Ashuntefannao";
let reg = /[a-z]/ig;
for (const iterator of str.matchAll(reg)) {
  console.log(iterator);
}

在原型定义 matchAll方法，用于在旧浏览器中工作，不需要添加g 模式运行

{
        String.prototype.matchAll = function (reg) {
          let match = this.match(reg);
          if (match) {
            let str = this.replace(reg, "^".repeat(match[0].length));
            let matchArr = str.matchAll(reg) || [];
            return [match, ...matchArr];
          }
        };
        let reg = /a/;
        console.log("ashuna".matchAll(reg));
}

exec

使用 g 模式修正符并结合 exec 循环操作可以获取结果和匹配细节

<body>
  <h1>ashuntefannao.com</h1>
  <h1>阿顺</h1>
  <h2>Ashun.com</h2>
</body>
<script>
  function search(string, reg) {
    const matchs = [];
    while ((data = reg.exec( string))) {
      matchs.push(data);
    }
    return matchs;
  }
  console.log(search(document.body.innerHTML, /<(h[1-6]).*?>[\s\S]+?<\/\1>/gi));
</script>

使用上面定义的函数来检索字符串中的网址

let sites = `https://ashunwang.com  
https://www.sina.com.cn
https://astfn.github.io`;

let res = search(sites, /https?:\/\/(\w+\.)?(\w+\.)+(com|cn|io)/gi);
console.dir(res);

字符方法

下面介绍的方法是 String 提供的支持正则表达式的方法

search

search(str/reg) 方法用于检索字符串中指定的子字符串，也可以使用正则表达式搜索，返回值为索引位置

1 2	let str = "Ashuntefannao.com"; console.log(str.search(".com"));

使用正则表达式搜索

1	console.log(str.search(/\.com/i));

match

直接使用字符串搜索

1 2	let str = "Ashuntefannao.com"; console.log(str.match("com"));

使用正则获取内容，下面是简单的搜索字符串, 非g模式下，只匹配一次，且返回详细参数信息。

let as = "Ashuntefannao";
let res = as.match(/a/i);
console.log(res);
console.log(res[0]); //匹配的结果
console.log(res[index]); //出现的位置

如果使用 g 修饰符时，就不会有结果的详细信息了（可以使用exec），下面是获取所有h1~6的标题元素

1
2
3

let body = document.body.innerHTML;
let result = body.match(/<(h[1-6]).*?>[\s\S]+?<\/\1>/g);
console.table(result);

matchAll

在新浏览器中支持使用 matchAll 结合g模式操作，并返回迭代对象

let str = "Ashuntefannao";
let reg = /[a-z]/ig;
for (const iterator of str.matchAll(reg)) {
  console.log(iterator);
}

split

用于使用字符串或正则表达式分隔字符串，下面是使用字符串分隔日期

1 2	let str = "2023-02-12"; console.log(str.split("-")); //["2023", "02", "12"]

如果日期的连接符不确定，那就要使用正则操作了

1 2	let str = "2023/02-12"; console.log(str.split(/-\|\//));

replace

replace 方法不仅可以执行基本字符替换，也可以进行正则替换，下面替换日期连接符

1 2	let str = "2023/02/12"; console.log(str.replace(/\//g, "-")); //2023-02-12

替换字符串可以插入下面的特殊变量名：

变量	说明
$$	插入一个 “$”。
$&	插入匹配的结果。
$`	插入当前匹配的子串左边的内容。
$’	插入当前匹配的子串右边的内容。
$n	假如第一个参数是 `RegExp` ，并且 n 是个小于100的非负整数，则`$n`意为：插入第 n 个原子组匹配的字符串。提示：索引是从1开始
$<alia>	与`$n`类似，也是插入原子组匹配的内容，但`$<alia>`是通过别名锁定对应原子组

在”阿顺”前后再分别添加2个=

1 2	let as = "=阿顺="; console.log(as.replace(/阿顺/g, "$`$`$&$'$'")); // ===阿顺===

把电话号用 - 连接

1 2	let tell = "(010)99999999 (020)8888888"; console.log(tell.replace(/$(\d{3,4})$(\d{7,8})/g, "$1-$2"));

把所有阿顺关键字加上链接 https://www.Ashuntefannao.com

<body>
  阿顺特烦恼希望大家没有烦恼，阿顺讨厌烦恼
</body>
<script>
  const body = document.body;
  body.innerHTML = body.innerHTML.replace(
    /阿顺/g,
    `<a href="https://www.Ashuntefannao.com">$&</a>`
  );
</script>

回调函数

replace 支持回调函数操作，用于处理复杂的替换逻辑，回调函数参数与str.match\str.matchAll\reg.exec返回的详细参数是对应的。

变量名	代表的值
`str`	匹配的子串。（对应于上述的$&。）
`p1,p2, ...`	假如replace()方法的第一个参数是一个 `RegExp` ，则代表第n个原子组匹配的字符串。
`index`	匹配到的子字符串在原字符串中的索引。（比如，如果原字符串是 `'abcd'`，匹配到的子字符串是 `'bc'`，那么这个参数将会是 1）
`source`	被匹配的原字符串。
NamedCaptureGroup	命名捕获组匹配的对象

将关于ashun的链接协议全部置为https ，并补全 www.

<body>
  <main>
    <a style="color: red" href="http://www.ashun.com"> 阿顺 </a>
    <a id="l1" href="http://Ashuntefannao.com">阿顺特烦恼</a>
    <a href="http://yahoo.com">雅虎</a>
    <h4>http://www.ashun.com</h4>
  </main>
</body>
<script>
 {
        let reg = /(<a.*?)(http)?(:\/\/)(www\.)?(Ashuntefannao|ashun)/g;
        let body = document.body;
        body.innerHTML = body.innerHTML.replace(reg, (val, ...args) => {
          args[1] = "https";
          args[3] = args[3] || "www.";
          return args.slice(0, 5).join("");
        });
        console.log(body.innerHTML.match(reg));
  }
</script>

将标题标签全部替换为 p 标签

<body>
  <h1>Ashuntefannao.com</h1>
  <h2>ashun.com</h2>
  <h1>阿顺特烦恼</h1>
</body>

<script>
  const reg = /<(h[1-6])>(.*?)<\/\1>/g;
  const body = document.body.innerHTML;
  const html = body.replace(reg, function(str, tag, content) {
    return `<p>${content}</p>`;
  });
  document.body.innerHTML = html;
</script>

删除页面中的 h1~h6 标签

<body>
  <h1>Ashuntefannao.com</h1>
  <h2>ashun.com</h2>
  <h1>阿顺特烦恼</h1>
</body>
<script>
  const reg = /<(h[1-6])>(.*?)<\/\1>/g;
  const body = document.body.innerHTML;
  const html = body.replace(reg, "");
  document.body.innerHTML = html;
</script>

使用回调函数将 阿顺 添加上链接

<body>
  <div class="content">
    阿顺特烦恼希望大家没有烦恼,阿顺讨厌烦恼
  </div>
</body>

<script>
  let content = document.querySelector(".content");
  content.innerHTML = content.innerHTML.replace("阿顺", function(
    search,
    index,
    source
  ) {
    return `<a href="https://www.Ashuntefannao.com">${search}</a>`;
  });
</script>

为所有标题添加上 hot 类

<body>
  <div class="content">
    <h1>Ashuntefannao.com</h1>
  	<h2>ashun.com</h2>
  	<h1>阿顺特烦恼</h1>
  </div>
</body>
<script>
  let content = document.querySelector(".content");
  let reg = /<(h[1-6])(.*?)>([\s\S]*?)<\/\1>/gi;
  content.innerHTML = content.innerHTML.replace(
    reg,
    (
      search, //匹配到的字符
      p1, //第一个原子组
      p2, //第二个原子组
      p3, //第三个原子组
      index, //索引位置
      source //原字符
    ) => {
      return `
    <${p1}${p2} class="hot">${p3}</${p1}>
    `;
    }
  );
</script>

正则方法

下面是 RegExp 正则对象提供的操作方法

test

reg.test(str)用于判断str是否符合匹配条件，返回Boolean

检测输入的邮箱是否合法

<body>
  <input type="text" name="email" />
</body>

<script>
  let email = document.querySelector(`[name="email"]`);
  email.addEventListener("keyup", e => {
    console.log(/^\w+@\w+\.\w+$/.test(e.target.value));
  });
</script>

exec

配合 g 模式使用时，reg.exec(str)可以循环调用直到全部匹配完。

配合 g 模式使用时，应一直操作同一个正则，即把正则定义为变量使用，这样才能够不断的向后匹配
使用 g 修饰符最后匹配不到时返回 null

计算内容中阿顺出现的次数

<body>
  <div class="content">
    阿顺特烦恼希望大家没有烦恼，阿顺讨厌烦恼	--阿顺
  </div>
</body>

<script>
  let content = document.querySelector(".content");
  let reg = /(?<tag>阿顺)/g;
  let num = 0;
  while ((result = reg.exec(content.innerHTML))) {
    num++;
  }
  console.log(`阿顺共出现${num}次`);
</script>

断言匹配

断言虽然写在扩号中但它不是组，所以不会在匹配结果中保存，可以将断言理解为正则中的条件。

断言匹配语法	含义
(?=exp)	将其放置于匹配内容的后面，用于约束后面的匹配内容

(?=exp)

零宽先行断言 (?=exp) 匹配后面为 exp 的内容

把后面内容为特烦恼 的阿顺加上链接.

断言匹配的结果不会返回到内容当中，以下replace接收到的第一个参数只有阿顺
断言匹配，只是一个约束条件。

<body>
  <main>
    阿顺特烦恼希望大家没有烦恼，阿顺讨厌烦恼。
  </main>
</body>

<script>
  const main = document.querySelector("main");
  const reg = /阿顺(?=特烦恼)/gi;
  main.innerHTML = main.innerHTML.replace(
    reg,
    v => `<a href="https://Ashuntefannao.com">${v}</a>`
  );
</script>

下面是将价格后面添加上 .00

<script>
  let lessons = `
    js,200元,300次
    php,300.00元,100次
    node.js,180元,260次
  `;
  let reg = /(\d+)(.00)?(?=元)/gi;
  lessons = lessons.replace(reg, (v, ...args) => {
    args[1] = args[1] || ".00";
    return args.splice(0, 2).join("");
  });
  console.log(lessons);
</script>

使用断言验证用户名必须为五位，下面正则体现断言是不是组，并且不在匹配结果中记录

<body>
  <input type="text" name="username" />
</body>

<script>
  document
    .querySelector(`[name="username"]`)
    .addEventListener("keyup", function() {
      let reg = /^(?=[a-z]{5}$)/i;
      console.log(reg.exec(this.value));
    });
</script>

(?<=exp)

零宽后行断言 ?<=exp 匹配前面为 exp 的内容

匹配前面是Ashuntefannao 的数字

1
2
3

let str = "Ashuntefannao789Ashun666";
let reg = /(?<=Ashuntefannao)\d+/i;
console.log(str.match(reg)); //789

匹配前后都是数字的内容

{
        let str = "Ashuntefannao789Ashun123";
        let reg = /(?<=\d)[a-z]+(?=\d)/gi;
        // let reg = /(?<=\d{3}).+(?=\d{3})/g;
        console.log(str.match(reg));
}

所有a标签的超链接替换为Ashuntefannao.com

<body>
  <a href="https://baidu.com">百度</a>
  <a href="https://yahoo.com">雅虎</a>
</body>
<script>
  const body = document.body;
  let reg = /(?<=<a.*href=(['"])).+?(?=\1)/gi;
  // console.log(body.innerHTML.match(reg));
  body.innerHTML = body.innerHTML.replace(reg, "https://Ashuntefannao.com");
</script>

把前面内容为阿顺 的特烦恼加上链接.

<body>
  <main>
    阿顺特烦恼希望大家没有烦恼，阿顺讨厌烦恼。
  </main>
</body>

<script>
  const main = document.querySelector("main");
  const reg = /(?<=阿顺)特烦恼/gi;
  main.innerHTML = main.innerHTML.replace(
    reg,
    v => `<a href="https://Ashuntefannao.com">${v}</a>`
  );
</script>

将电话的后四位模糊处理

let users = `
  阿顺电话: 12345678901
  客服电话: 98745675603
`;

let reg = /(?<=\d{7})\d+\s*/g;
users = users.replace(reg, str => {
  return "*".repeat(4);
});
console.log(users); //阿顺电话: 1234567**** 客服电话: 9874567****

获取标题中的内容

1
2
3

let th = `<h1>阿顺特烦恼</h1>`;
let reg = /(?<=<h1.*?>).*(?=<\/h1>)/g;
console.log(th.match(reg));

(?!exp)

零宽负向先行断言 后面不能出现 exp 指定的内容

使用 (?!exp)字母后面不能为两位数字

1
2
3

let str = "Ashun12";
let reg = /[a-z]+(?!\d{2})$/i;
console.table(reg.exec(str));

下例为用户名中不能出现阿顺

下例中，(?!exp)前面什么也没有，即""后面不准出现阿顺，也就是任意一个地方后面不能出现阿顺

<body>
  <main>
    <input type="text" name="username" />
  </main>
</body>
<script>
  const input = document.querySelector(`[name="username"]`);
  input.addEventListener("keyup", function() {
    const reg = /^(?!.*阿顺)[(\w)(\p{sc=Han})]{5,6}$/iu;
    console.log(this.value.match(reg));
  });
</script>

(?<!exp)

零宽负向后行断言 前面不能出现exp指定的内容

获取前面不是数字的字符

1
2
3

let str = "Ashun99shun";
let reg = /(?<!\d)[a-z]+/i;
console.log(reg.exec(str)); //Ashun

把所有不是以 https://oss.Ashuntfn.com 开始的静态资源替换为新网址

<body>
  <main>
    <a href="https://www.Ashuntfn.com/1.jpg">1.jpg</a>
    <a href="https://oss.Ashuntfn.com/2.jpg">2.jpg</a>
    <a href="https://cdn.Ashuntfn.com/2.jpg">3.jpg</a>
    <a href="https://Ashuntfn.com/2.jpg">3.jpg</a>
  </main>
</body>
<script>
  const main = document.querySelector("main");
  let reg = /https:\/\/(?<!oss\.).+?(?=\/)/gi;
  main.innerHTML = main.innerHTML.replace(reg, v => {
    console.log(v);
    return "https://oss.Ashuntfn.com";
  });
</script>