Discussion:
A head buildworld race visible in the ci.freebsd.org build history
(too old to reply)
Mark Millard
2018-06-16 05:55:11 UTC
Permalink
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
powerpc64):

--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70

where the next build works despite the change being
irrelevant to whatever ranlib complained about.

Other builds failed similarly:

--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70

and:

--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70

and so on.


It is not limited to powerpc64. For example, for aarch64
there are:

--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70

and:

--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70

and:

--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70


Even amd64 gets such:

--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70

and:


--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70

make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70

and:


--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70


(Notice the variability in what .a the ranlib's fail for.)





===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Bryan Drewery
2018-06-18 19:42:46 UTC
Permalink
Post by Mark Millard
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70
where the next build works despite the change being
irrelevant to whatever ranlib complained about.
--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70
--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70
and so on.
It is not limited to powerpc64. For example, for aarch64
--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70
--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70
--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70
--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70
--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70
make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70
--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70
(Notice the variability in what .a the ranlib's fail for.)
I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.
--
Regards,
Bryan Drewery
Konstantin Belousov
2018-06-18 20:45:18 UTC
Permalink
Post by Bryan Drewery
Post by Mark Millard
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70
where the next build works despite the change being
irrelevant to whatever ranlib complained about.
--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70
--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70
and so on.
It is not limited to powerpc64. For example, for aarch64
--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70
--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70
--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70
--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70
--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70
make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70
--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70
(Notice the variability in what .a the ranlib's fail for.)
I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.
FWIW, I got the similar failure when I did last checks for the OFED
commit. For me, it was libgcc.a.
Bryan Drewery
2018-06-18 22:27:01 UTC
Permalink
Post by Konstantin Belousov
Post by Bryan Drewery
Post by Mark Millard
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70
where the next build works despite the change being
irrelevant to whatever ranlib complained about.
--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70
--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70
and so on.
It is not limited to powerpc64. For example, for aarch64
--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70
--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70
--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70
--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70
--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70
make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70
--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70
(Notice the variability in what .a the ranlib's fail for.)
I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.
FWIW, I got the similar failure when I did last checks for the OFED
commit. For me, it was libgcc.a.
If it was -lgcc_s then it's a known rare build race due to
tools/install.sh not handling -S.
--
Regards,
Bryan Drewery
Li-Wen Hsu
2018-06-18 22:31:14 UTC
Permalink
Post by Bryan Drewery
Post by Konstantin Belousov
Post by Bryan Drewery
Post by Mark Millard
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70
where the next build works despite the change being
irrelevant to whatever ranlib complained about.
--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70
--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70
and so on.
It is not limited to powerpc64. For example, for aarch64
--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70
--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70
--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70
--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70
--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70
make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70
--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70
(Notice the variability in what .a the ranlib's fail for.)
I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.
FWIW, I got the similar failure when I did last checks for the OFED
commit. For me, it was libgcc.a.
If it was -lgcc_s then it's a known rare build race due to
tools/install.sh not handling -S.
It seems a more general problem, this one:

https://ci.freebsd.org/job/FreeBSD-head-aarch64-build/8190/console

calls for libcuse_p.a, while this one:

https://ci.freebsd.org/job/FreeBSD-head-mips-build/2919/console

calls for libfifolog.a
--
Li-Wen Hsu <***@FreeBSD.org>
https://lwhsu.org
Bryan Drewery
2018-06-18 22:33:56 UTC
Permalink
Post by Li-Wen Hsu
Post by Bryan Drewery
Post by Konstantin Belousov
Post by Bryan Drewery
Post by Mark Millard
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70
where the next build works despite the change being
irrelevant to whatever ranlib complained about.
--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70
--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70
and so on.
It is not limited to powerpc64. For example, for aarch64
--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70
--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70
--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70
--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70
--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70
make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70
--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70
(Notice the variability in what .a the ranlib's fail for.)
I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.
FWIW, I got the similar failure when I did last checks for the OFED
commit. For me, it was libgcc.a.
If it was -lgcc_s then it's a known rare build race due to
tools/install.sh not handling -S.
https://ci.freebsd.org/job/FreeBSD-head-aarch64-build/8190/console
https://ci.freebsd.org/job/FreeBSD-head-mips-build/2919/console
calls for libfifolog.a
Well why is ar -> ranlib so special? Nothing else is failing.
What filesystem are these using for objdirs?
What revision is the host kernel?
--
Regards,
Bryan Drewery
Mark Millard
2018-06-18 21:03:56 UTC
Permalink
Post by Bryan Drewery
Post by Mark Millard
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70
where the next build works despite the change being
irrelevant to whatever ranlib complained about.
--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70
--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70
and so on.
It is not limited to powerpc64. For example, for aarch64
--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70
--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70
--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70
--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70
--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70
make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70
--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70
(Notice the variability in what .a the ranlib's fail for.)
I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.
Looking at a bunch of the failures, spanning multiple
FreeBSD-head-*-build types of builds, I see only:

NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
NODE_NAME butler1.nyi.freebsd.org

for the failures that I looked at.

So your "on that system" might well be correct.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Li-Wen Hsu
2018-06-18 22:27:27 UTC
Permalink
On Mon, Jun 18, 2018 at 5:04 PM Mark Millard via freebsd-toolchain
Post by Mark Millard
Post by Bryan Drewery
Post by Mark Millard
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70
where the next build works despite the change being
irrelevant to whatever ranlib complained about.
--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70
--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70
and so on.
It is not limited to powerpc64. For example, for aarch64
--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70
--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70
--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70
--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70
--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70
make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70
--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70
(Notice the variability in what .a the ranlib's fail for.)
I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.
Looking at a bunch of the failures, spanning multiple
NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
NODE_NAME butler1.nyi.freebsd.org
for the failures that I looked at.
So your "on that system" might well be correct.
Thanks for the insight, the build is done in a 11.1-R jail on a
-CURRENT host. butler1.nyi is running r333388 (as a canary) while
other builders are mostly running r328278. I upgraded few others and
it seems can reproduce the issue, and now I downgraded all the build
slaves to r328278 before we find the root cause.

Li-Wen

--
Li-Wen Hsu <***@FreeBSD.org>
https://lwhsu.org
Bryan Drewery
2018-06-18 23:08:24 UTC
Permalink
Post by Mark Millard
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
Where is this error even coming from? It's not in the usr.bin/ar code
and ranlib does not cause it.

# ranlib -D uh
ranlib: warning: uh: no such file
--
Regards,
Bryan Drewery
Mark Millard
2018-06-19 01:03:12 UTC
Permalink
Post by Bryan Drewery
Post by Mark Millard
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
Where is this error even coming from? It's not in the usr.bin/ar code
and ranlib does not cause it.
# ranlib -D uh
ranlib: warning: uh: no such file
A more complete sequence is (with some
other text mixed in, as in where I got
the text from on ci.freebsd.org):

--- libvgl.a ---
building static vgl library
ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o mouse.o keyboard.o | tsort -q`
--- all_subdir_lib/libsysdecode ---
ranlib -D libsysdecode.a
--- all_subdir_lib/libvgl ---
ranlib -D libvgl.a
ranlib: fatal: Failed to open 'libvgl.a'
--- all_subdir_lib/libsysdecode ---
ranlib: fatal: Failed to open 'libsysdecode.a'
--- all_subdir_lib/libvgl ---
*** [libvgl.a] Error code 70

So, in essence,

ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o mouse.o keyboard.o | tsort -q`
ranlib -D libvgl.a
ranlib: fatal: Failed to open 'libvgl.a'

It is not obvious to me that the "Failed to open" means
that there was "no such file". Might there be some other
form of "Failed to open" for a file that does exist from
the ar at least having created its output .a file?


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Mark Millard
2018-06-19 01:46:41 UTC
Permalink
Post by Mark Millard
Post by Bryan Drewery
Post by Mark Millard
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
Where is this error even coming from? It's not in the usr.bin/ar code
and ranlib does not cause it.
# ranlib -D uh
ranlib: warning: uh: no such file
A more complete sequence is (with some
other text mixed in, as in where I got
--- libvgl.a ---
building static vgl library
ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o mouse.o keyboard.o | tsort -q`
--- all_subdir_lib/libsysdecode ---
ranlib -D libsysdecode.a
--- all_subdir_lib/libvgl ---
ranlib -D libvgl.a
ranlib: fatal: Failed to open 'libvgl.a'
--- all_subdir_lib/libsysdecode ---
ranlib: fatal: Failed to open 'libsysdecode.a'
--- all_subdir_lib/libvgl ---
*** [libvgl.a] Error code 70
So, in essence,
ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o mouse.o keyboard.o | tsort -q`
ranlib -D libvgl.a
ranlib: fatal: Failed to open 'libvgl.a'
It is not obvious to me that the "Failed to open" means
that there was "no such file". Might there be some other
form of "Failed to open" for a file that does exist from
the ar at least having created its output .a file?
Also, if what varies is the head system version (for failing vs.
working) and what is the same is running a 11.1R jail, then it
would seem to be the underlying head system software in each
that matters for the ar -> ranlib sequence behavior, but not
11.1R's ar or ranlib or 11.1R's libraries indirectly involved
--nor in head's ar or ranlib (or their indirections). head's:
unused.

The only parts of head that could be involved are parts that the
11.1R jail does not avoid.

This suggests more basic infrastructure in head to me.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Bryan Drewery
2018-06-18 23:29:05 UTC
Permalink
Post by Li-Wen Hsu
On Mon, Jun 18, 2018 at 5:04 PM Mark Millard via freebsd-toolchain
Post by Mark Millard
Post by Bryan Drewery
Post by Mark Millard
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70
where the next build works despite the change being
irrelevant to whatever ranlib complained about.
--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70
--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70
and so on.
It is not limited to powerpc64. For example, for aarch64
--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70
--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70
--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70
--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70
--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70
make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70
--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70
(Notice the variability in what .a the ranlib's fail for.)
I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.
Looking at a bunch of the failures, spanning multiple
NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
NODE_NAME butler1.nyi.freebsd.org
for the failures that I looked at.
So your "on that system" might well be correct.
Thanks for the insight, the build is done in a 11.1-R jail on a
-CURRENT host. butler1.nyi is running r333388 (as a canary) while
other builders are mostly running r328278. I upgraded few others and
it seems can reproduce the issue, and now I downgraded all the build
slaves to r328278 before we find the root cause.
The error is coming from libarchive which had a change between those
Post by Li-Wen Hsu
------------------------------------------------------------------------
r328332 | mm | 2018-01-24 06:24:17 -0800 (Wed, 24 Jan 2018) | 14 lines
Sync libarchive with vendor.
PR #893: delete dead ppmd7 alloc callbacks
PR #904: Fix archive freeing bug in bsdcat
PR #961: Fix ZIP format names
PR #962: Don't modify attributes for existing directories
when ARCHIVE_EXTRACT_NO_OVERWRITE is set
PR #964: Fix -Werror=implicit-fallthrough= for GCC 7
PR #970: zip: Allow backslash as path separator
MFC after: 1 week
------------------------------------------------------------------------
Nothing obvious stands out in the change to me though from a brief look.
--
Regards,
Bryan Drewery
Ed Maste
2018-06-19 00:35:57 UTC
Permalink
Post by Bryan Drewery
The error is coming from libarchive which had a change between those
Post by Li-Wen Hsu
------------------------------------------------------------------------
r328332 | mm | 2018-01-24 06:24:17 -0800 (Wed, 24 Jan 2018) | 14 lines
Li-Wen reported that the build is done in a 11.1-rel jail though, so
the libarchive (or any userland) change shouldn't be responsible.

Can we update a canary builder to somewhere between r328278 and r333388?
Li-Wen Hsu
2018-06-19 15:02:54 UTC
Permalink
Post by Ed Maste
Li-Wen reported that the build is done in a 11.1-rel jail though, so
the libarchive (or any userland) change shouldn't be responsible.
Can we update a canary builder to somewhere between r328278 and r333388?
butler1.nyi.freebsd.org is running r331373 now.
--
Li-Wen Hsu <***@FreeBSD.org>
https://lwhsu.org
Mark Millard
2018-06-20 01:23:53 UTC
Permalink
Post by Li-Wen Hsu
Post by Ed Maste
Li-Wen reported that the build is done in a 11.1-rel jail though, so
the libarchive (or any userland) change shouldn't be responsible.
Can we update a canary builder to somewhere between r328278 and r333388?
butler1.nyi.freebsd.org is running r331373 now.
But there seems to be another of the ar -> ranlib failures
after that on butler1.nyi.freebsd.org :

https://ci.freebsd.org/job/FreeBSD-head-powerpc-build/6321/ shows:

22:12:05
--- _bootstrap-tools-lib/liby ---

22:12:05
ranlib -D liby.a

22:12:05
ranlib: fatal: Failed to open 'liby.a'

22:12:05
*** [liby.a] Error code 70


with:

NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
NODE_NAME butler1.nyi.freebsd.org



And in fact there is at least one more:

https://ci.freebsd.org/job/FreeBSD-head-sparc64-build/8291/consoleText

shows:

--- all_subdir_lib/libipsec ---
ranlib -D libipsec_p.a
ranlib: fatal: Failed to open 'libipsec_p.a'
*** [libipsec_p.a] Error code 70



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Li-Wen Hsu
2018-06-20 04:14:27 UTC
Permalink
Post by Mark Millard
Post by Li-Wen Hsu
Post by Ed Maste
Li-Wen reported that the build is done in a 11.1-rel jail though, so
the libarchive (or any userland) change shouldn't be responsible.
Can we update a canary builder to somewhere between r328278 and r333388?
butler1.nyi.freebsd.org is running r331373 now.
But there seems to be another of the ar -> ranlib failures
Yes I was trying to narrow down the cause, now it seems between
r328278 and r330304.

butler1.nyi.freebsd.org is back to run r328278. And I'll try to
reproduce this in elsewhere.
--
Li-Wen Hsu <***@FreeBSD.org>
https://lwhsu.org
Mark Millard
2018-06-20 05:44:22 UTC
Permalink
Post by Li-Wen Hsu
Post by Mark Millard
Post by Li-Wen Hsu
Post by Ed Maste
Li-Wen reported that the build is done in a 11.1-rel jail though, so
the libarchive (or any userland) change shouldn't be responsible.
Can we update a canary builder to somewhere between r328278 and r333388?
butler1.nyi.freebsd.org is running r331373 now.
But there seems to be another of the ar -> ranlib failures
Yes I was trying to narrow down the cause, now it seems between
r328278 and r330304.
butler1.nyi.freebsd.org is back to run r328278. And I'll try to
reproduce this in elsewhere.
Okay. Then I'll quit looking to report which way butler1.nyi.freebsd.org
is working (implicitly: search direction information).

I will report if I see any new examples. (Seems unlikely.)


Side note . . .

It took me a while to find what to look to find the head version
and jail version involved. For what I reported (powerpc):

22:12:03 uname:
22:12:03 FreeBSD FreeBSD-head-powerpc-build.jail.ci.FreeBSD.org 11.1-RELEASE FreeBSD 12.0-CURRENT #0 r330304M: Sat Mar 3 02:23:02 UTC 2018 ***@build-12.freebsd.org:/usr/obj/usr/src/sys/CLUSTER12 amd64

Now I know.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Mark Millard
2018-06-21 21:48:59 UTC
Permalink
Post by Li-Wen Hsu
Post by Mark Millard
Post by Li-Wen Hsu
Post by Ed Maste
Li-Wen reported that the build is done in a 11.1-rel jail though, so
the libarchive (or any userland) change shouldn't be responsible.
Can we update a canary builder to somewhere between r328278 and r333388?
butler1.nyi.freebsd.org is running r331373 now.
But there seems to be another of the ar -> ranlib failures
Yes I was trying to narrow down the cause, now it seems between
r328278 and r330304.
butler1.nyi.freebsd.org is back to run r328278. And I'll try to
reproduce this in elsewhere.
Has the range r328278 < PROBLEM_START <= r330304 been narrowed down
some more?

(I'm just curious were the problem started.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Loading...