sed cannot wild match garbled characters when using -z option (solved)

interpretive language scripts


Moderator: Forum moderators

Post Reply
miltonx
Posts: 156
Joined: Sat Nov 28, 2020 12:04 am
Has thanked: 11 times
Been thanked: 6 times

sed cannot wild match garbled characters when using -z option (solved)

Post by miltonx »

I tried to treat entire text as one line by using -z option, but then the .* widcard failed:

Code: Select all

echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
(outputs nothing)

When -z is removed, it correctly matches:

Code: Select all

echo -e 'firstline\n2ndline' | sed -E "s|.*|x|"
output:x
            x

I'm running this on Debian 11.
Any ideas why this happens?

Last edited by miltonx on Wed Sep 20, 2023 2:29 am, edited 1 time in total.
User avatar
MochiMoppel
Posts: 1115
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 17 times
Been thanked: 359 times

Re: sed cannot wild match when using -z option

Post by MochiMoppel »

miltonx wrote: Tue Sep 19, 2023 3:45 am

I tried to treat entire text as one line by using -z option, but then the .* widcard failed:

Code: Select all

echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
(outputs nothing)

Works here (BW64) as expected. Outputs a single 'x'.

Burunduk
Posts: 244
Joined: Thu Jun 16, 2022 6:16 pm
Has thanked: 6 times
Been thanked: 122 times

Re: sed cannot wild match when using -z option

Post by Burunduk »

Works on Fossapup, same sed 4.7 as in Debian 11.

With -z sed removes line feeds too. Maybe you've just overlooked that x before the prompt string.

Code: Select all

root# echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
xroot#
miltonx
Posts: 156
Joined: Sat Nov 28, 2020 12:04 am
Has thanked: 11 times
Been thanked: 6 times

Re: sed cannot wild match when using -z option

Post by miltonx »

Burunduk wrote: Tue Sep 19, 2023 5:49 am

Works on Fossapup, same sed 4.7 as in Debian 11.

With -z sed removes line feeds too. Maybe you've just overlooked that x before the prompt string.

Code: Select all

root# echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
xroot#

Yes, I overlooked that x lurking there. :o :o :o

miltonx
Posts: 156
Joined: Sat Nov 28, 2020 12:04 am
Has thanked: 11 times
Been thanked: 6 times

Re: sed cannot wild match when using -z option

Post by miltonx »

But I still have problem with the following sample.

There is a text file /z with content (including some garbed characters) like this:

Code: Select all

ffmpeg version 2.8.11 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 5.3.0 (GCC)
  configuration: --prefix=/usr --libdir=/usr/lib64 --enable-libmp3lame --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-pthreads --enable-small --enable-postproc --enable-libvorbis --enable-gpl --enable-shared --enable-nonfree --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-debug --enable-bzlib --enable-zlib --enable-libspeex --enable-version3 --enable-runtime-cpudetect --enable-x11grab --enable-libschroedinger --enable-libtheora --enable-libxvid --enable-swscale --enable-libvpx
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/sda2/wreckit2.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isomavc1mp423gp5
    creation_time   : 2019-01-04 16:08:31
    encoder         : My MP4Box GUI 0.6.0.6 <http://my-mp4box-gui.zymichost.com>
  Duration: 01:51:12.12, start: 0.000000, bitrate: 1345 kb/s
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1246 kb/s, 25 fps, 25 tbr, 25k tbn, 50 tbc (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:31
      handler_name    : Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸.Ralph.Breaks.the.Internet.2018.HD-720p.X264.AAC-99Mp4_track201.h264
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 95 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:44
      handler_name    : [91xinpian.com]Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸HD1080P¸ßÇåӢÓïÖÐ×ÖÎÞˮӡ_track2_und_track1_und.aac
    Stream #0:2(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 OD Handler
    Stream #0:3(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 Scene Description Handler
At least one output file must be specified

Running:

Code: Select all

cat /z | sed -z -E "s|.*(Duration.*)At least.*|\1|" > /zz

Resulting /zz is: (nothing matched)

Code: Select all

ffmpeg version 2.8.11 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 5.3.0 (GCC)
  configuration: --prefix=/usr --libdir=/usr/lib64 --enable-libmp3lame --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-pthreads --enable-small --enable-postproc --enable-libvorbis --enable-gpl --enable-shared --enable-nonfree --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-debug --enable-bzlib --enable-zlib --enable-libspeex --enable-version3 --enable-runtime-cpudetect --enable-x11grab --enable-libschroedinger --enable-libtheora --enable-libxvid --enable-swscale --enable-libvpx
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/sda2/wreckit2.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isomavc1mp423gp5
    creation_time   : 2019-01-04 16:08:31
    encoder         : My MP4Box GUI 0.6.0.6 <http://my-mp4box-gui.zymichost.com>
  Duration: 01:51:12.12, start: 0.000000, bitrate: 1345 kb/s
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1246 kb/s, 25 fps, 25 tbr, 25k tbn, 50 tbc (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:31
      handler_name    : Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸.Ralph.Breaks.the.Internet.2018.HD-720p.X264.AAC-99Mp4_track201.h264
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 95 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:44
      handler_name    : [91xinpian.com]Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸HD1080P¸ßÇåӢÓïÖÐ×ÖÎÞˮӡ_track2_und_track1_und.aac
    Stream #0:2(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 OD Handler
    Stream #0:3(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 Scene Description Handler
At least one output file must be specified
User avatar
MochiMoppel
Posts: 1115
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 17 times
Been thanked: 359 times

Re: sed cannot wild match when using -z option

Post by MochiMoppel »

miltonx wrote: Tue Sep 19, 2023 8:36 am

Running:

Code: Select all

cat /z | sed -z -E "s|.*(Duration.*)At least.*|\1|" > /zz

Results in

Code: Select all

Duration: 01:51:12.12, start: 0.000000, bitrate: 1345 kb/s
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1246 kb/s, 25 fps, 25 tbr, 25k tbn, 50 tbc (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:31
      handler_name    : Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸.Ralph.Breaks.the.Internet.2018.HD-720p.X264.AAC-99Mp4_track201.h264
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 95 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:44
      handler_name    : [91xinpian.com]Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸HD1080P¸ßÇåӢÓïÖÐ×ÖÎÞˮӡ_track2_und_track1_und.aac
    Stream #0:2(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 OD Handler
    Stream #0:3(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 Scene Description Handler

Works as expected.
Maybe you examined the input file instead of the output file...

Burunduk
Posts: 244
Joined: Thu Jun 16, 2022 6:16 pm
Has thanked: 6 times
Been thanked: 122 times

Re: sed cannot wild match when using -z option

Post by Burunduk »

Is it my turn to say something?

Well, this code puts zz into the system root directory next to z already there. For some reason I don't like it.

Other than that, it works OK. I think it'll work for you too if you copy the ffmpeg output back. It's now a valid unicode sequence but it probably wasn't initially. You can try to run this:

LC_ALL=C sed -z -E "s|.*(Duration.*)At least.*|\1|" ffmpeg.out > ffmpeg.txt

If it works, the problem is in those garbled characters. See GNU sed manual, paragraph 5.9.1

For example:

Code: Select all

root# echo -e 'abÄba\nab\xc4ba'
abÄba
abÄba
root# echo -e 'abÄba\nab\xc4ba'| sed -E 's/a(.*)a/\1/'
bÄb
abÄba
root# echo -e 'abÄba\nab\xc4ba'| LC_ALL=C sed -E 's/a(.*)a/\1/'
bÄb
bÄb
miltonx
Posts: 156
Joined: Sat Nov 28, 2020 12:04 am
Has thanked: 11 times
Been thanked: 6 times

Re: sed cannot wild match when using -z option

Post by miltonx »

Burunduk wrote: Tue Sep 19, 2023 8:19 pm

Is it my turn to say something?

Well, this code puts zz into the system root directory next to z already there. For some reason I don't like it.

Other than that, it works OK. I think it'll work for you too if you copy the ffmpeg output back. It's now a valid unicode sequence but it probably wasn't initially. You can try to run this:

LC_ALL=C sed -z -E "s|.*(Duration.*)At least.*|\1|" ffmpeg.out > ffmpeg.txt

If it works, the problem is in those garbled characters. See GNU sed manual, paragraph 5.9.1

For example:

Code: Select all

root# echo -e 'abÄba\nab\xc4ba'
abÄba
abÄba
root# echo -e 'abÄba\nab\xc4ba'| sed -E 's/a(.*)a/\1/'
bÄb
abÄba
root# echo -e 'abÄba\nab\xc4ba'| LC_ALL=C sed -E 's/a(.*)a/\1/'
bÄb
bÄb

It's not good practice to put randomly named files under / directory, but this was purely for quick experimenting this script.

After testing, it looks like the garbled characters caused the failure to match. The sed locale considerations page provides very good information. It solves my question.

@MochiMoppel made it work probably because the garbed characters underwent some modification when posting to this forum. When I copy it back to /z, it also works. But when I run ffmpeg again and redirect the result to /z, sed fails.

Post Reply

Return to “Scripts”