Page 1 of 1

sed cannot wild match garbled characters when using -z option (solved)

Posted: Tue Sep 19, 2023 3:45 am
by miltonx

I tried to treat entire text as one line by using -z option, but then the .* widcard failed:

Code: Select all

echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
(outputs nothing)

When -z is removed, it correctly matches:

Code: Select all

echo -e 'firstline\n2ndline' | sed -E "s|.*|x|"
output:x
            x

I'm running this on Debian 11.
Any ideas why this happens?


Re: sed cannot wild match when using -z option

Posted: Tue Sep 19, 2023 4:55 am
by MochiMoppel
miltonx wrote: Tue Sep 19, 2023 3:45 am

I tried to treat entire text as one line by using -z option, but then the .* widcard failed:

Code: Select all

echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
(outputs nothing)

Works here (BW64) as expected. Outputs a single 'x'.


Re: sed cannot wild match when using -z option

Posted: Tue Sep 19, 2023 5:49 am
by Burunduk

Works on Fossapup, same sed 4.7 as in Debian 11.

With -z sed removes line feeds too. Maybe you've just overlooked that x before the prompt string.

Code: Select all

root# echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
xroot#

Re: sed cannot wild match when using -z option

Posted: Tue Sep 19, 2023 8:22 am
by miltonx
Burunduk wrote: Tue Sep 19, 2023 5:49 am

Works on Fossapup, same sed 4.7 as in Debian 11.

With -z sed removes line feeds too. Maybe you've just overlooked that x before the prompt string.

Code: Select all

root# echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
xroot#

Yes, I overlooked that x lurking there. :o :o :o


Re: sed cannot wild match when using -z option

Posted: Tue Sep 19, 2023 8:36 am
by miltonx

But I still have problem with the following sample.

There is a text file /z with content (including some garbed characters) like this:

Code: Select all

ffmpeg version 2.8.11 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 5.3.0 (GCC)
  configuration: --prefix=/usr --libdir=/usr/lib64 --enable-libmp3lame --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-pthreads --enable-small --enable-postproc --enable-libvorbis --enable-gpl --enable-shared --enable-nonfree --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-debug --enable-bzlib --enable-zlib --enable-libspeex --enable-version3 --enable-runtime-cpudetect --enable-x11grab --enable-libschroedinger --enable-libtheora --enable-libxvid --enable-swscale --enable-libvpx
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/sda2/wreckit2.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isomavc1mp423gp5
    creation_time   : 2019-01-04 16:08:31
    encoder         : My MP4Box GUI 0.6.0.6 <http://my-mp4box-gui.zymichost.com>
  Duration: 01:51:12.12, start: 0.000000, bitrate: 1345 kb/s
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1246 kb/s, 25 fps, 25 tbr, 25k tbn, 50 tbc (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:31
      handler_name    : Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸.Ralph.Breaks.the.Internet.2018.HD-720p.X264.AAC-99Mp4_track201.h264
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 95 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:44
      handler_name    : [91xinpian.com]Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸HD1080P¸ßÇåӢÓïÖÐ×ÖÎÞˮӡ_track2_und_track1_und.aac
    Stream #0:2(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 OD Handler
    Stream #0:3(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 Scene Description Handler
At least one output file must be specified

Running:

Code: Select all

cat /z | sed -z -E "s|.*(Duration.*)At least.*|\1|" > /zz

Resulting /zz is: (nothing matched)

Code: Select all

ffmpeg version 2.8.11 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 5.3.0 (GCC)
  configuration: --prefix=/usr --libdir=/usr/lib64 --enable-libmp3lame --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-pthreads --enable-small --enable-postproc --enable-libvorbis --enable-gpl --enable-shared --enable-nonfree --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-debug --enable-bzlib --enable-zlib --enable-libspeex --enable-version3 --enable-runtime-cpudetect --enable-x11grab --enable-libschroedinger --enable-libtheora --enable-libxvid --enable-swscale --enable-libvpx
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/sda2/wreckit2.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isomavc1mp423gp5
    creation_time   : 2019-01-04 16:08:31
    encoder         : My MP4Box GUI 0.6.0.6 <http://my-mp4box-gui.zymichost.com>
  Duration: 01:51:12.12, start: 0.000000, bitrate: 1345 kb/s
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1246 kb/s, 25 fps, 25 tbr, 25k tbn, 50 tbc (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:31
      handler_name    : Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸.Ralph.Breaks.the.Internet.2018.HD-720p.X264.AAC-99Mp4_track201.h264
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 95 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:44
      handler_name    : [91xinpian.com]Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸HD1080P¸ßÇåӢÓïÖÐ×ÖÎÞˮӡ_track2_und_track1_und.aac
    Stream #0:2(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 OD Handler
    Stream #0:3(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 Scene Description Handler
At least one output file must be specified

Re: sed cannot wild match when using -z option

Posted: Tue Sep 19, 2023 10:22 am
by MochiMoppel
miltonx wrote: Tue Sep 19, 2023 8:36 am

Running:

Code: Select all

cat /z | sed -z -E "s|.*(Duration.*)At least.*|\1|" > /zz

Results in

Code: Select all

Duration: 01:51:12.12, start: 0.000000, bitrate: 1345 kb/s
    Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1246 kb/s, 25 fps, 25 tbr, 25k tbn, 50 tbc (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:31
      handler_name    : Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸.Ralph.Breaks.the.Internet.2018.HD-720p.X264.AAC-99Mp4_track201.h264
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 95 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:44
      handler_name    : [91xinpian.com]Î޵ÐÆƻµÍõ2£ºÂ´Ã³Ã„ֻ¥ÁªÃÃ¸HD1080P¸ßÇåӢÓïÖÐ×ÖÎÞˮӡ_track2_und_track1_und.aac
    Stream #0:2(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 OD Handler
    Stream #0:3(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
    Metadata:
      creation_time   : 2019-01-04 16:08:47
      handler_name    : GPAC MPEG-4 Scene Description Handler

Works as expected.
Maybe you examined the input file instead of the output file...


Re: sed cannot wild match when using -z option

Posted: Tue Sep 19, 2023 8:19 pm
by Burunduk

Is it my turn to say something?

Well, this code puts zz into the system root directory next to z already there. For some reason I don't like it.

Other than that, it works OK. I think it'll work for you too if you copy the ffmpeg output back. It's now a valid unicode sequence but it probably wasn't initially. You can try to run this:

LC_ALL=C sed -z -E "s|.*(Duration.*)At least.*|\1|" ffmpeg.out > ffmpeg.txt

If it works, the problem is in those garbled characters. See GNU sed manual, paragraph 5.9.1

For example:

Code: Select all

root# echo -e 'abÄba\nab\xc4ba'
abÄba
abÄba
root# echo -e 'abÄba\nab\xc4ba'| sed -E 's/a(.*)a/\1/'
bÄb
abÄba
root# echo -e 'abÄba\nab\xc4ba'| LC_ALL=C sed -E 's/a(.*)a/\1/'
bÄb
bÄb

Re: sed cannot wild match when using -z option

Posted: Wed Sep 20, 2023 2:18 am
by miltonx
Burunduk wrote: Tue Sep 19, 2023 8:19 pm

Is it my turn to say something?

Well, this code puts zz into the system root directory next to z already there. For some reason I don't like it.

Other than that, it works OK. I think it'll work for you too if you copy the ffmpeg output back. It's now a valid unicode sequence but it probably wasn't initially. You can try to run this:

LC_ALL=C sed -z -E "s|.*(Duration.*)At least.*|\1|" ffmpeg.out > ffmpeg.txt

If it works, the problem is in those garbled characters. See GNU sed manual, paragraph 5.9.1

For example:

Code: Select all

root# echo -e 'abÄba\nab\xc4ba'
abÄba
abÄba
root# echo -e 'abÄba\nab\xc4ba'| sed -E 's/a(.*)a/\1/'
bÄb
abÄba
root# echo -e 'abÄba\nab\xc4ba'| LC_ALL=C sed -E 's/a(.*)a/\1/'
bÄb
bÄb

It's not good practice to put randomly named files under / directory, but this was purely for quick experimenting this script.

After testing, it looks like the garbled characters caused the failure to match. The sed locale considerations page provides very good information. It solves my question.

@MochiMoppel made it work probably because the garbed characters underwent some modification when posting to this forum. When I copy it back to /z, it also works. But when I run ffmpeg again and redirect the result to /z, sed fails.