Container Adventures 2
This whole thing happened after a couple of more trials in the container world
and gave birth to those foolish junior prs/issues on kubernetes/minikube
#15678 ca3
#15696 ca3
#15677 ca3 - the issue.. but the discussion is on slack
#15491 ca3 – not able to rebase + change in workflow
#15697 the create-volume bug
#15699 the create-volume proposed solution
Solving container creation issues for the podman driver – minikube
We were able to merge the newly proposed cache-invalidation mechanism (not yet actually.. it’s still under discussion here, but mainly on slack), based on contentDigest.. so now the kicBase’s cache interactions should be something more generic.
Thanks to this, we were able to define a new entity called kicDriver, which is something more generic than docker or podman.. its a mechanism that takes the common aspects of both (maybe in the future will also support something else.. who knows) and put them to work, all this packed in a generic interface.
Now.. even tho the cache phase of minikube start
seems to work with the podman driver,
only the rootful podman makes it past the container creation phase..
For the rootless podman we have the following:
😄 minikube v1.28.0 on whatever..
▪ MINIKUBE_ROOTLESS=true
✨ Using the podman driver based on user configuration
📌 Using rootless Podman driver
👍 Starting control plane node minikube in cluster minikube
🚜 Pulling base image to minikube cache ...
💾 Downloading Kubernetes v1.25.3 preload ...
> preloaded-images-k8s-v18-v1...: 406.99 MiB / 406.99 MiB 100.00% 56.74 M
> gcr.io/k8s-minikube/kicbase...: 404.96 MiB / 404.96 MiB 100.00% 22.67 M
⌛ Loading KicDriver with base image ...
🔥 Creating podman container (CPUs=2, Memory=8000MB) ...
✋ Stopping node "minikube" ...
🔥 Deleting "minikube" in podman ...
🤦 StartHost failed, but will try again: creating host: create: creating: create kic node: container name "minikube": log: 2023-01-23T15:07:03.883512000+02:00 + grep -qw cpu /sys/fs/cgroup/cgroup.controllers
2023-01-23T15:07:03.884604000+02:00 + echo 'ERROR: UserNS: cpu controller needs to be delegated'
2023-01-23T15:07:03.884740000+02:00 ERROR: UserNS: cpu controller needs to be delegated
2023-01-23T15:07:03.884872000+02:00 + exit 1: container exited unexpectedly
🔥 Creating podman container (CPUs=2, Memory=8000MB) ...
😿 Failed to start podman container. Running "minikube delete" may fix it: creating host: create: creating: setting up container node: creating volume for minikube container: podman volume create minikube --label name.minikube.sigs.k8s.io=minikube --label created_by.minikube.sigs.k8s.io=true: exit status 125
stdout:
stderr:
Error: volume with name minikube already exists: volume already exists
❌ Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: creating: setting up container node: creating volume for minikube container: podman volume create minikube --label name.minikube.sigs.k8s.io=minikube --label created_by.minikube.sigs.k8s.io=true: exit status 125
stdout:
stderr:
Error: volume with name minikube already exists: volume already exists
the “rootless podman” being:
$ minikube config set driver podman
$ minikube config set container-runtime crio
$ minikube config set rootless true
There’s something wrong with the container creation step with this driver config..
I’ve got no Idea how this mechanism is like.. I only worked with the kicBase cache so far.
I’d start by looking at how the mechanism as a whole looks like;
so where does the “Creating podman container…” come from?
## from root of minikube..
$ git grep -n "Creating %s container" ## this produced no output
$ git grep -n "Creating" ## this obviously produced a whole lot of it..
I looked for a container reference inside the output.. but eventually stumbled upon this:
pkg/minikube/machine/start.go:399: out.Step(style.StartingVM, "Creating {{.driver_name}} {{.machine_type}} (CPUs={{.number_of_cpus}}, Memory={{.memory_size}}MB) ...", out.V{"driver_name": cfg.Driver, "number_of_cpus": cfg.CPUs, "memory_size": cfg.Memory, "machine_type": machineType})
which is part of this:
// pkg/minikube/machine/start.go
func showHostInfo(h *host.Host, cfg config.ClusterConfig) {
// ......
if driver.IsKIC(cfg.Driver) { // TODO:medyagh add free disk space on docker machine
register.Reg.SetStep(register.CreatingContainer)
out.Step(style.StartingVM, "Creating {{.driver_name}} {{.machine_type}} (CPUs={{.number_of_cpus}}, Memory={{.memory_size}}MB) ...", out.V{"driver_name": cfg.Driver, "number_of_cpus": cfg.CPUs, "memory_size": cfg.Memory, "machine_type": machineType})
return
}
which in called by this:
// pkg/minikube/machine/start.go
func createHost(api libmachine.API, cfg *config.ClusterConfig, n *config.Node) (*host.Host, error) {
klog.Infof("createHost starting for %q (driver=%q)", n.Name, cfg.Driver)
// ...
// config read and some other setup stuff...
if err := timedCreateHost(h, api, cfg.StartHostTimeout); err != nil {
return nil, errors.Wrap(err, "creating host")
}
klog.Infof("duration metric: libmachine.API.Create for %q took %s", cfg.Name, time.Since(cstart))
if cfg.Driver == driver.SSH {
showHostInfo(h, *cfg) // <-- where we come from..
}
if err := postStartSetup(h, *cfg); err != nil {
return h, errors.Wrap(err, "post-start")
}
Now I bet that we’re finding the string with the crying cat 😿 of the minikube’s problematic output inside postSartSetup().
…
and I was wrong.. it happens.
Logs from ~/.minikube/logs/lastStart.txt(where all the klog.Whatever() goes..) show that we didn’t even ever reached postSartSetup().
we getting closer..
What if look backwards, starting from the error itself.. I know a package that has the cat for sure:
pkg/minikube/style/style.go
and the crying cat is…
// pkg/minikube/style/style.go
// Config is a map of style name to style struct
// For consistency, ensure that emojis added render with the same width across platforms.
var Config = map[Enum]Options{
// ...
Embarrassed: {Prefix: "🤦 ", LowPrefix: LowWarning},
Sad: {Prefix: "😿 "}, // this one
Shrug: {Prefix: "🤷 "},
// ...
}
and gopls shows me that it is used only in a bunch of places..
pkg/minikube/style/style_enum.go
79: Sad
cmd/minikube/cmd/config/profile.go
84: out.ErrT(style.Sad, `Error loading profile config: {{.error}}`, out.V{"error": err})
93: out.ErrT(style.Sad, `Error while setting kubectl current context : {{.error}}`, out.V{"error": err})
cmd/minikube/cmd/delete.go
542: out.ErrT(style.Sad, deletionError.Error())
554: out.ErrT(style.Sad, "Multiple errors deleting profiles")
cmd/minikube/cmd/update-context.go
51: out.ErrT(style.Sad, `Error while setting kubectl current context: {{.error}}`, out.V{"error": err})
pkg/minikube/node/start.go
713: out.ErrT(style.Sad, `Failed to start {{.driver}} {{.driver_type}}. Running "{{.cmd}}" may fix it: {{.error}}`, out.V{"driver": drv, "driver_type": driver.MachineType(drv), "cmd": mustload.ExampleCmd(cc.Name, "delete"), "error": err})
pkg/minikube/out/out.go
422: msg := Sprintf(style.Sad, "If the above advice does not help, please let us know:")
pkg/minikube/service/service.go
291: out.Styled(style.Sad, "service {{.namespace_name}}/{{.service_name}} has no node port", out.V{"namespace_name": namespace, "service_name": service})
pkg/minikube/style/style.go
100: Sad: {Prefix: "😿 "},
It’s pretty obvious we’re looking at pkg/minikube/node/start.go’s
// pkg/minikube/node/start.go
// startHostInternal starts a new minikube host using a VM or None
func startHostInternal(api libmachine.API, cc *config.ClusterConfig, n *config.Node, delOnFail bool) (*host.Host, bool, error) {
It’s not exactly were we’re coming from. Let me figure it:
pkg/minikube/node/start.go -- Provision()
|
| The furthest it makes sense to go..
| This is alredy familiar, it calls beginDownloadKicBaseImage()
| which is basically the kicBase cache logic
| we tinkered with last time.
|
|
pkg/minikube/node/start.go -- startMachine()
|
|
-----> pkg/minikube/node/start.go -- startHostInternal()
The one for the createHost() we ended up before the cat search is only a couple of layers deeper.
So we could draw this:
= pkg/minikube/node/start.go -- Provision()
|
|
= pkg/minikube/node/start.go -- startMachine()
|
|
= pkg/minikube/node/start.go -- startHostInternal()
|
|
= . ===😿====the=cat=error===
| ^
| |
= pkg/minikube/machine/start.go -- StartHost() |
| |
| |
= pkg/minikube/machine/start.go -- createHost() ------------ error is here.
|
|
= pkg/minikube/machine/start.go -- showHostInfo()
|
|
= =====🔥==the=creation=message======
…I’m never drawing that again..
It could have been much easier to stacktrace it inside a debugger and copypaste it.
We’re seeing the flame ‘cause the code for creaHost().. before doing anything, calls showHostInfo() for every driver except the ssh one.
// pkg/minikube/machine/start.go
func createHost(api libmachine.API, cfg *config.ClusterConfig, n *config.Node) (*host.Host, error) {
// ...
if cfg.Driver != driver.SSH {
showHostInfo(nil, *cfg)
}
// this was the part I previously marked
// "config read and some other setup stuff..."
def := registry.Driver(cfg.Driver) // cfg.Driver -> just a string
if def.Empty() {
return nil, fmt.Errorf("unsupported/missing driver: %s", cfg.Driver)
}
dd, err := def.Config(*cfg, *n)
if err != nil {
return nil, errors.Wrap(err, "config")
}
data, err := json.Marshal(dd)
if err != nil {
return nil, errors.Wrap(err, "marshal")
}
h, err := api.NewHost(cfg.Driver, data)
if err != nil {
return nil, errors.Wrap(err, "new host")
}
defer postStartValidations(h, cfg.Driver)
// ...
That registry.Driver(cfg.Driver) seems interesting..
and I already heard the term “registry” from a conversation on kübernetes’s slack,
in regard of an issue..
the"registry"
That def := registry.Driver(string)
function.. just takes a driver name and instantiates
the base of it; later, when def.Config() is called.. some specific configuration magic starts to happen:
/// pkg/minikube/registry/global.go
// Driver gets a named driver from the global registry
func Driver(name string) DriverDef {
return globalRegistry.Driver(name)
}
// globalRegistry being a var: -- pkg/minikube/registry/global.go
var (
// globalRegistry is a globally accessible driver registry
globalRegistry = newRegistry()
)
// pkg/minikube/registry/registry.go
func newRegistry() *driverRegistry {
return &driverRegistry{
drivers: make(map[string]DriverDef),
driversByAlias: make(map[string]DriverDef),
}
}
// and that globalRegistry.Driver(string):
// pkg/minikube/registry/registry.go
// Driver returns a driver given a name
func (r *driverRegistry) Driver(name string) DriverDef {
r.lock.RLock()
defer r.lock.RUnlock()
def, ok := r.drivers[name]
if ok {
return def
}
// Check if we have driver def with name as alias
return r.driversByAlias[name]
}
I think now we know that a driver “registry” is:
// pkg/minikube/registry/registry.go
type driverRegistry struct {
drivers map[string]DriverDef
driversByAlias map[string]DriverDef
lock sync.RWMutex
}
Just a fancy struct that contains the drivers that minikube supports?
Hyphotesis supported by the fact that DriverDef == …
// pkg/minikube/registry/registry.go
// DriverDef defines how to initialize and load a machine driver
type DriverDef struct {
// Name of the machine driver. It has to be unique.
Name string
// Alias contains a list of machine driver aliases. Each alias should also be unique.
Alias []string
// Config is a function that emits a configured driver struct
Config Configurator
// Init is a function that initializes a machine driver, if built-in to the minikube binary
Init Loader
// Status returns the installation status of the driver
Status StatusChecker
// Default is whether this driver is selected by default or not (opt-in).
Default bool
// Priority returns the prioritization for selecting a driver by default.
Priority Priority
}
We even have a doc.go file for that package registry:
// pkg/minikube/registry/doc.go
/*
Copyright 2018 The Kubernetes Authors All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// This package contains the registry to enable a docker machine driver to be used
// in minikube.
package registry
Hmmmm… that’s not much to work with.
From this one.. it looks like an image registry.. which doesn’t seem the case.
Fortunately.. there is a drvs folder here, that could enlighten us;
Let me show you a piece of tree:
# from pkg/minikube/registry
$ tree drvs/
drvs/
├── docker
│ ├── docker.go
│ └── docker_test.go
├── init.go
├── kvm2
│ ├── doc.go
│ └── kvm2.go
├── podman
│ └── podman.go
├── ssh
│ └── ssh.go
...
└── vmwarefusion
├── doc.go
└── vmwarefusion.go
Nothing could be more enlightning
The init.go is just a list of supported drivers:
// pkg/minikube/registry/drvs/init.go
package drvs
import (
// Register all of the drvs we know of
_ "k8s.io/minikube/pkg/minikube/registry/drvs/docker"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/hyperkit"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/hyperv"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/kvm2"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/none"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/parallels"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/podman"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/qemu2"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/ssh"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/virtualbox"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/vmware"
_ "k8s.io/minikube/pkg/minikube/registry/drvs/vmwarefusion"
)
And each driver (say docker..) has an init() call for its package, that initializes its content inside the “registry”:
func init() {
if err := registry.Register(registry.DriverDef{
Name: driver.Docker,
Config: configure,
Init: func() drivers.Driver { return kic.NewDriver(kic.Config{OCIBinary: oci.Docker}) },
Status: status,
Default: true,
Priority: registry.HighlyPreferred,
}); err != nil {
panic(fmt.Sprintf("register failed: %v", err))
}
}
cobraish..
No point in showing Register() code.. it is clear that it just puts a DriverDef inside the globalRegistry.
So that the picture is as follows:
- pkg/minikube/registry contains the global “registry”.. even tho we don’t directly interact with it, it contains all the drivers.
- the various pkg/minikube/registry/drvs are initialized by putting their driver inside the global registry at runtime, using init() functions; as if they’re already there.. and any new driver can hook to the global registry by just initing its pkg.
back to our bug..
So we’re back to pkg/minikube/machine/start.go - createHost()
// pkg/minikube/machine/start.go
func createHost(api libmachine.API, cfg *config.ClusterConfig, n *config.Node) (*host.Host, error) {
// ...
def := registry.Driver(cfg.Driver) // DONE
if def.Empty() {
return nil, fmt.Errorf("unsupported/missing driver: %s", cfg.Driver)
}
dd, err := def.Config(*cfg, *n) // << could be a source of issues..
// keeping in mind and returning later
if err != nil {
return nil, errors.Wrap(err, "config")
}
data, err := json.Marshal(dd)
if err != nil {
return nil, errors.Wrap(err, "marshal")
}
h, err := api.NewHost(cfg.Driver, data) // << HERE.
if err != nil {
return nil, errors.Wrap(err, "new host")
}
defer postStartValidations(h, cfg.Driver)
// ...
Even tho we’re failing on timedCreateHost() inside k8s.io/minikube/pkg/minikube/machine.createHost, I’d still give api.NewHost() a look.. just to have a little more info about the process,
…
Oh.. that’s not minikube. That api.NewHost() comes from “github.com/docker/machine/libmachine”.
That’s part of the “docker-centric” heritage of minikube :)
Looking at this and at the description for the docker machine repo…
// libmachine/libmachine.go @ https://github.com/docker/machine.git
func (api *Client) NewHost(driverName string, rawDriver []byte) (*host.Host, error) {
driver, err := api.clientDriverFactory.NewRPCClientDriver(driverName, rawDriver)
// ...
return &host.Host{
// config and filepaths
// based on the provided driverName..
}, nil
}
It could be that the bigger picture is as follows..
- We’re instantiating/finding our “node”, which could be any kind of thing depending on the driver we’re choosing:
- container image for the KiC(Kübernetes in Container) workflow
- vm image for the qemu/virtualbox/whatever..
- a generic host with an sshd installed
- our localhost that we aknowledged has everything in place to kick kübernetes
- We’re creating a “docker machine”(giving docker capabilities to the vm/remote-host) if needed but this should require some discrimination based on the driver
- We’re operating our cluster by the means provided by the Driver interface
But its too soon for that..
We could be looking at another chunk we would have to undockerize from minikube.
Just because of the fact that we’re using podman driver, we would have no need for a docker machine in the first place.
…
My bad.. it’s not actually https://github.com/docker/machine.git that has been archived..
THis quite interesting reading here
and a sum of it here, that described what happened.
And what happened is that now we’re
// go.mod
replace(
github.com/docker/machine => github.com/machine-drivers/machine v0.7.1-0.20211105063445-78a84df85426
)
And in fact using https://github.com/machine-drivers/machine which seems still maintaned; maintaining its fork relationship.
undockerize?
Back to our timedCreateHost(); we can see that it’s only a timer around an api.Create() call.
// pkg/minikube/machine/start.go
func timedCreateHost(h *host.Host, api libmachine.API, t time.Duration) error {
timeout := make(chan bool, 1)
go func() {
time.Sleep(t)
timeout <- true
}()
createFinished := make(chan bool, 1)
var err error
go func() {
err = api.Create(h)
createFinished <- true
}()
select {
case <-createFinished:
if err != nil {
// Wait for all the logs to reach the client
time.Sleep(2 * time.Second)
return errors.Wrap(err, "create")
}
return nil
case <-timeout:
return fmt.Errorf("create host timed out in %f seconds", t.Seconds())
}
}
where the used driver is LocalClient (dlv told me..)
That doesn’t seem part of the docker machine..
Even grepping the docker machine repo doensn’t show anything… would it be possible that..?
Yes.. grepping shows that minikube implements its own libmachine api interface.
Struct and Create() method look like this.
// pkg/minikube/machine/client.go
// LocalClient is a non-RPC implementation
// of the libmachine API
type LocalClient struct {
certsDir string
storePath string
*persist.Filestore
legacyClient libmachine.API
flock *fslock.Lock
}
// Create creates the host
func (api *LocalClient) Create(h *host.Host) error {
klog.Infof("LocalClient.Create starting")
start := time.Now()
defer func() {
klog.Infof("LocalClient.Create took %s", time.Since(start))
}()
def := registry.Driver(h.DriverName)
if def.Empty() {
return fmt.Errorf("driver %q does not exist", h.DriverName)
}
if def.Init == nil {
// NOTE: This will call provision.DetectProvisioner
return api.legacyClient.Create(h)
}
steps := []struct {
name string
f func() error
}{
{
"bootstrapping certificates",
func() error {
// Lock is needed to avoid race condition in parallel Docker-Env test because issue #10107.
// CA cert and client cert should be generated atomically, otherwise might cause bad certificate error.
lockErr := api.flock.LockWithTimeout(time.Second * 5)
if lockErr != nil {
return fmt.Errorf("failed to acquire bootstrap client lock: %v " + lockErr.Error())
}
defer func() {
lockErr = api.flock.Unlock()
if lockErr != nil {
klog.Errorf("failed to release bootstrap cert client lock: %v", lockErr.Error())
}
}()
certErr := cert.BootstrapCertificates(h.AuthOptions())
return certErr
},
},
{
"precreate",
h.Driver.PreCreateCheck,
},
{
"saving",
func() error {
return api.Save(h)
},
},
{
"creating",
h.Driver.Create,
},
{
"waiting",
func() error {
if driver.BareMetal(h.Driver.DriverName()) {
return nil
}
return mcnutils.WaitFor(drivers.MachineInState(h.Driver, state.Running))
},
},
{
"provisioning",
func() error {
// Skippable because we don't reconfigure Docker?
if driver.BareMetal(h.Driver.DriverName()) {
return nil
}
return provisionDockerMachine(h)
},
},
}
for _, step := range steps {
if err := step.f(); err != nil {
return errors.Wrap(err, step.name)
}
}
return nil
}
My editor is having a hard time navigating the machine mod.. I’m cloning and using it inside the workspace.
Each of the steps in the prev function, gives a name and a function to call;
Some steps use anonymous minikube functions, some use docker-machine ones, some rely on the underlying driver[!]
This driver is a docker-machine interface that has a wide range of implementation to fulfill any kind of need; in particular, here’s only some of the implementations..
drivers/amazonec2/amazonec2.go
64: type Driver struct {
drivers/azure/azure.go
67: type Driver struct {
drivers/digitalocean/digitalocean.go
23: type Driver struct {
drivers/errdriver/error.go
11: type Driver struct {
drivers/exoscale/exoscale.go
26: type Driver struct {
drivers/fakedriver/fakedriver.go
11: type Driver struct {
drivers/google/google.go
17: type Driver struct {
drivers/hyperv/hyperv.go
19: type Driver struct {
drivers/openstack/openstack.go
21: type Driver struct {
drivers/rackspace/rackspace.go
13: type Driver struct {
drivers/virtualbox/virtualbox.go
46: type Driver struct {
drivers/vmwarevcloudair/vcloudair.go
24: type Driver struct {
drivers/vmwarevsphere/vsphere.go
47: type Driver struct {
/home/andrew/go/src/wspace-container/minikube/pkg/drivers/kic/kic.go
53: type Driver struct {
/home/andrew/go/src/wspace-container/minikube/pkg/drivers/kvm/kvm.go
38: type Driver struct {
/home/andrew/go/src/wspace-container/minikube/pkg/drivers/none/none.go
47: type Driver struct {
/home/andrew/go/src/wspace-container/minikube/pkg/drivers/qemu/qemu.go
53: type Driver struct {
/home/andrew/go/src/wspace-container/minikube/pkg/drivers/ssh/ssh.go
46: type Driver struct {
/home/andrew/go/src/wspace-container/minikube/pkg/minikube/tests/driver_mock.go
33: type MockDriver struct {
This project aimed at bringing docker anywhere..
Minikube itself implements the driver in a number of chunks..
by pkg/drivers subpackages to be precise (kvm, kic, ..)
There’s really no need to show the struct of factory method.. the locations are above; let’s keep going.
We don’t seem to customize the precreate step function.. so we’re defaulting to docker-machine’s BaseDriver, which is that simple:
// libmachine/drivers/base.go @ machine
// PreCreateCheck is called to enforce pre-creation steps
func (d *BaseDriver) PreCreateCheck() error {
return nil
}
I’m interested particularly in the “create” step,
which we’re failing with rootless podman.
I would guess that the driver implementation that we’re looking at is the “Kübernetes In Container”(a.k.a. KiC) driver..
We’re looking at pkg/drivers/kic/kic.go - func(d *Driver) Create() error
.
That’s already a start.. stepping into it to see exactly where it breaks.
one step forward(?)
Found it!
A big chunk of minikube seems to be putting together this long sh command:
$ podman run -d -t --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname minikube --name minikube --label created_by.minikube.sigs.k8s.io=true --label name.minikube.sigs.k8s.io=minikube --label role.minikube.sigs.k8s.io= --label mode.minikube.sigs.k8s.io=minikube --network minikube --ip 192.168.49.2 --volume minikube:/var:exec --memory=8000mb -e container=podman --expose 8443 --publish=127.0.0.1::8443 --publish=127.0.0.1::22 --publish=127.0.0.1::2376 --publish=127.0.0.1::5000 --publish=127.0.0.1::32443 gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1673540226-15630
No joking.. it’s actually a cli run(it could’ve been either this or witchcraft):
// pkg/drivers/kic/oci/oci.go
// CreateContainer creates a container with "docker/podman run"
func createContainer(ociBin string, image string, opts ...createOpt) error {
// ...
if rr, err := runCmd(exec.Command(ociBin, args...)); err != nil {
// full error: docker: Error response from daemon: Range of CPUs is from 0.01 to 8.00, as there are only 8 CPUs available.
if strings.Contains(rr.Output(), "Range of CPUs is from") && strings.Contains(rr.Output(), "CPUs available") { // CPUs available
return ErrCPUCountLimit
}
// example: docker: Error response from daemon: Address already in use.
if strings.Contains(rr.Output(), "Address already in use") {
return ErrIPinUse
}
return err
}
That.. if fired by hand.. returns:
Error: unable to find network with name or ID minikube: network not found
We’re one step closer..
I remember this being a step prior to container creation:
// pkg/drivers/kic/kic.go
func (d *Driver) Create() error {
// ...
if gateway, err := oci.CreateNetwork(d.OCIBinary, networkName, d.NodeConfig.Subnet, staticIP); err != nil {
msg := "Unable to create dedicated network, this might result in cluster IP change after restart: {{.error}}"
args := out.V{"error": err}
if staticIP != "" {
exit.Message(reason.IfDedicatedNetwork, msg, args)
}
out.WarningT(msg, args)
// ...
Stepping into it..
Seeing that CreateNetwork() seems to flow perfectly.. There is one thing that don’t convince me: the oci.TryCreatedockernetwork()
function.. which takes an ociBin as parameter, so it should be good.. but we’re failing on finding a resource that this function
seems to be responsible for. I’m thinking about an error condition that is not checked for.
There we go..
At the end of the flow for oci.TryCreatedockernetwork(), the thing that happens is the same for the createContainer() thing:
an sh exec of the ociBin; This is what’s happening when all is initialized during runtime.
| > k8s.io/minikube/pkg/drivers/kic/oci.tryCreateDockerNetwork() ./pkg/drivers/kic/oci/network_create.go:146 (PC: 0x145927a)
141: args = append(args, fmt.Sprintf("com.docker.network.driver.mtu=%d", mtu))
142: }
143: }
144: args = append(args, fmt.Sprintf("--label=%s=%s", CreatedByLabelKey, "true"), fmt.Sprintf("--label=%s=%s", ProfileLabelKey, name), name)
145:
=> 146: rr, err := runCmd(exec.Command(ociBin, args...))
147: if err != nil {
148: klog.Errorf("failed to create %s network %s %s with gateway %s and mtu of %d: %v", ociBin, name, subnet.CIDR, subnet.Gateway, mtu, err)
149: // Pool overlaps with other one on this address space
150: if strings.Contains(rr.Output(), "Pool overlaps") {
151: return nil, ErrNetworkSubnetTaken
(dlv) p args
[]string len: 8, cap: 10, [
"network",
"create",
"--driver=bridge",
"--subnet=192.168.49.0/24",
"--gateway=192.168.49.1",
"--label=created_by.minikube.sigs.k8s.io=true",
"--label=name.minikube.sigs.k8s.io=minikube",
"minikube",
]
So we should be more than able to do the same thing by hand:
$ podman network create -driver=bridge --subnet=192.168.49.0/24 --gateway=192.168.49.1 --label=created_by.minikube.sigs.k8s.io=true --label=name.minikube.sigs.k8s.io=minikube minikube
Which fails with: Error: unsupported driver river=bridge: invalid argument
But apparently minikube is not detecting it;
the code:
// pkg/drivers/kic/oci/network_create.go
func tryCreateDockerNetwork(ociBin string, subnet *network.Parameters, mtu int, name string) (net.IP, error) {
// ...
rr, err := runCmd(exec.Command(ociBin, args...))
if err != nil {
klog.Errorf("failed to create %s network %s %s with gateway %s and mtu of %d: %v", ociBin, name, subnet.CIDR, subnet.Gateway, mtu, err)
// Pool overlaps with other one on this address space
if strings.Contains(rr.Output(), "Pool overlaps") {
return nil, ErrNetworkSubnetTaken
}
if strings.Contains(rr.Output(), "failed to allocate gateway") && strings.Contains(rr.Output(), "Address already in use") {
return nil, ErrNetworkGatewayTaken
}
if strings.Contains(rr.Output(), "is being used by a network interface") {
return nil, ErrNetworkGatewayTaken
}
return nil, fmt.Errorf("create %s network %s %s with gateway %s and MTU of %d: %w", ociBin, name, subnet.CIDR, subnet.Gateway, mtu, err)
}
return gateway, nil
}
What happens is that, at the end of the runCmd(), err is nil; so we’re returning as everything’s ok.
…
Oh.. my bad.. I was missing a ‘-’ in ‘-driver=bridge’: it should’ve been ‘- -driver=bridge’.
Hmmm.. it works..
Network’s there:
$ podman network ls
NETWORK ID NAME DRIVER
5086431107ca minikube bridge
2f259bab93aa podman bridge
Then also the createContainer() command works..
$ podman run -d -t --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname minikube --name minikube --label created_by.minikube.sigs.k8s.io=true --label name.minikube.sigs.k8s.io=minikube --label role.minikube.sigs.k8s.io= --label mode.minikube.sigs.k8s.io=minikube --network minikube --ip 192.168.49.2 --volume minikube:/var:exec --memory=8000mb -e container=podman --expose 8443 --publish=127.0.0.1::8443 --publish=127.0.0.1::22 --publish=127.0.0.1::2376 --publish=127.0.0.1::5000 --publish=127.0.0.1::32443 gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1673540226-15630
It returns 0
Dear god…
…
Oh.. when I was running $podman run ...
I was doing it before $ podman network creat...
, hence the confusion..
The fact that I missed a ‘-’ during my parsing of the args dumped by the debugger(I’ve done it by hand..) seemed to confirm a
false trail.
one step forward(!)
We were not failing inside createContainerNode() (now I don’t even know why we were getting there in the first place..); actually we’re failing inside oci.PrepareContainernode().. which does volume preparation.. which seems like our initial error
❌ Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: creating: setting up container node: creating volume for minikube container: podman volume create minikube --label name.minikube.sigs.k8s.io=minikube --label created_by.minikube.sigs.k8s.io=true: exit status 125`
stdout:
stderr:
Error: volume with name minikube already exists: volume already exists
Which I didn’t even read carefully I guess..
In my defense, if retrying to $ minikube start
without $ minikube delete --all
first.. the error message changes to:
❌ Exiting due to GUEST_PROVISION: Failed to start host: driver start: start: podman start minikube: exit status 125
stdout:
stderr:
Error: no container with name or ID "minikube" found: no such container
So we’re failing the oci.PrepareContainernode() step here:
// pkg/drivers/kic/kic.go
func (d *Driver) Create() error {
// ...
if err := oci.PrepareContainerNode(params); err != nil {
return errors.Wrap(err, "setting up container node")
}
// ...
which fails the “create” steps here:
// pkg/minikube/machine/client.go
func (api *LocalClient) Create(h *host.Host) error
// ...
{
"creating",
h.Driver.Create,
},
// ...
for _, step := range steps {
if err := step.f(); err != nil {
return errors.Wrap(err, step.name)
}
}
return nil
and so forth…
oci.PrepareContainerNode() calls another ociBin-run-like function, called createVolume()…
This is the incriminated function:
// createVolume creates a volume to be attached to the container with correct labels and prefixes based on profile name
// Caution ! if volume already exists does NOT return an error and will not apply the minikube labels on it.
// TODO: this should be fixed as a part of https://github.com/kubernetes/minikube/issues/6530
func createVolume(ociBin string, profile string, nodeName string) error {
if _, err := runCmd(exec.Command(ociBin, "volume", "create", nodeName, "--label", fmt.Sprintf("%s=%s", ProfileLabelKey, profile), "--label", fmt.Sprintf("%s=%s", CreatedByLabelKey, "true"))); err != nil {
return err
}
return nil
}
Just a $ podman volume create
with a bunch of labels.. which if I had to guess, are used by minikube to keep track of
what minikube creates, rather than what the user creates for its purposes outside the minikube perspective.. thus to don’t clean
out user created containers during clean phase.
Our error is that minikube volume already exists… The comment on that function is pretty talkative too.
So what is #6530 about?
It’s marked “Closed” but I cannot see any reference to any merged pr.
$ podman volume ls
shows a minikube volume… then $ minikube delete --all
is issued; I’d expect to see no more minikube
volume.. but instead.. it’s still there.
So it should be safe to assume that removing that volume would fix our “podman-rootless-minikube-not-starting” issue. Really hope so..
Let’s try a clean run:
$ podman volume rm minikube
$ podman system prune --all
$ minikube delete --all
## aaaaand..
$ minikube start
Fuck.
It doensn’t work.. same error:
✋ Stopping node "minikube" ...
🔥 Deleting "minikube" in podman ...
🤦 StartHost failed, but will try again: creating host: create: creating: create kic node: container name "minikube": log: 2023-01-24T14:20:51.264852000+02:00 + grep -qw cpu /sys/fs/cgroup/cgroup.controllers
2023-01-24T14:20:51.265972000+02:00 + echo 'ERROR: UserNS: cpu controller needs to be delegated'
2023-01-24T14:20:51.266089000+02:00 ERROR: UserNS: cpu controller needs to be delegated
2023-01-24T14:20:51.266169000+02:00 + exit 1: container exited unexpectedly
🔥 Creating podman container (CPUs=2, Memory=8000MB) ...
😿 Failed to start podman container. Running "minikube delete" may fix it: creating host: create: creating: setting up container node: creating volume for minikube container: podman volume create minikube --label name.minikube.sigs.k8s.io=minikube --label created_by.minikube.sigs.k8s.io=true: exit status 125
stdout:
stderr:
Error: volume with name minikube already exists: volume already exists
❌ Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: creating: setting up container node: creating volume for minikube container: podman volume create minikube --label name.minikube.sigs.k8s.io=minikube --label created_by.minikube.sigs.k8s.io=true: exit status 125
stdout:
stderr:
Error: volume with name minikube already exists: volume already exists
Ok.. getting a grip to it
It’s a container creation that fails for some “cpu controller” issue.. and the oci.preparecontainernode() is not safe to run
multiple times.. so this results is us retrying container creation, but in the end failing for an unrelated error.
EDIT: Actually its not a container creation that fails.. The container creates fine..
The newly created container immediately exits.. logging this:
- userns=
- grep -Eqv ‘0[[:space:]]+0[[:space:]]+4294967295’ /proc/self/uid_map
- userns=1
- echo ‘INFO: running in a user namespace (experimental)’
INFO: running in a user namespace (experimental)- validate_userns
- [[ -z 1 ]]
- local nofile_hard
++ ulimit -Hn- nofile_hard=1048576
- local nofile_hard_expected=64000
- [[ 1048576 -lt 64000 ]]
- [[ -f /sys/fs/cgroup/cgroup.controllers ]]
- for f in cpu memory pids
- grep -qw cpu /sys/fs/cgroup/cgroup.controllers
- echo ‘ERROR: UserNS: cpu controller needs to be delegated’
ERROR: UserNS: cpu controller needs to be delegated- exit 1
We could easily fix at least the show-wrong-err-msg issue, by adding a check on the volume.
..AAAnd we could (at some point) include volume cleanings when $ minikube delete --all
is issued; (TODO)
which seems tied to #15222
…
Or maybe not.. It seems just that I didn’t know about the extra --purge
flag to $ minikube delete --all
, which seems to also remove
volumes when we’re using podman.. The issue is still relevant by the way, it doesn’t remove docker volumes
PS.
It’s confusing.. yeah..
That last volume thing confused me as well.
…
While I was writing this, I was running $ minikube start
after $ minikube delete --all --purge
with rootless podman, just to be sure.
And then something very strange happened.. It worked..
…
..But only because the --purge
flag, as described by $ minikube delete --help
, has the effect of
removing the .minikube folder in the home directory.. Effectively removing cache and minikube config.
So I was running with docker driver instead of podman.
That’s why it was working.. I’ll spare you the logs and my theories on this false trail.. this article is getting long.
Plus the help msg for the --purge
flag, doesn’t mention volumes..
Someone on slack stated that minikube delete --all
is for volumes.. any claim I made above is to be tossed..
I’m not sure what happened.
starting to fix stuff..
What am I doing now.. fix the previous while the memory is still fresh.. or go to the next one and see if workarounds work first, so that to mark it as “yes, it could work” and apply fixes to the stack of errors.. hoping to not find other n errors in the way.
Writing down things helps.. I was about to go with the latter, but on a second thought…
The “volume already exists” error
This should be no big deal to solve.. We could just add a check for the volume presence first.
Let’s check it out on a new branch.. separating PRs..
I talked about it on slack and created #15697.
The original function
// pkg/drivers/kic/oci/volumes.go
// createVolume creates a volume to be attached to the container with correct labels and prefixes based on profile name
// Caution ! if volume already exists does NOT return an error and will not apply the minikube labels on it.
// TODO: this should be fixed as a part of https://github.com/kubernetes/minikube/issues/6530
func createVolume(ociBin string, profile string, nodeName string) error {
if _, err := runCmd(exec.Command(ociBin, "volume", "create", nodeName, "--label", fmt.Sprintf("%s=%s", ProfileLabelKey, profile), "--label", fmt.Sprintf("%s=%s", CreatedByLabelKey, "true"))); err != nil {
return err
}
return nil
}
The new version:
// createVolume creates a volume to be attached to the container with correct labels and prefixes based on profile name
// Caution ! if volume already exists does NOT return an error and will not apply the minikube labels on it.
func createVolume(ociBin string, profile string, nodeName string) error {
rr, err := runCmd(exec.Command(ociBin, "volume", "ls"))
if err == nil {
if strings.Contains(rr.Output(), nodeName) {
klog.Infof("Trying to create %s volume using %s: Volume already exists !", nodeName, ociBin)
return nil
}
_, err = runCmd(exec.Command(ociBin, "volume", "create", nodeName, "--label", fmt.Sprintf("%s=%s", ProfileLabelKey, profile), "--label", fmt.Sprintf("%s=%s", CreatedByLabelKey, "true")))
}
return err
}
Commit message:
Adds check for volume existence in oci driver's createVolume()
As the function's description states:
It should not return err or change labels if volume already exists..
Info that we found a volume might be helpful tho..
As unlikely as it may sound.. user might create a minikube volume
and wonder why its actual minikube cluster is not starting.
We're deleteing TODO msg.
The issue was already closed.
Done…
Next.
The “cpu controller” error
This time it won’t be that easy.. I think.. I think this time we’re off the minikube sources.. the error is inside the kicBase container itself:
Placing an os.Exit() here:
// pkg/drivers/kic/oci/oci.go
func CreateContainerNode(p CreateParams) error {
// ...
os.Exit(0) // HERE --
if err := createContainer(p.OCIBinary, p.Image, withRunArgs(runArgs...), withMounts(p.Mounts), withPortMappings(p.PortMappings)); err != nil {
return errors.Wrap(err, "create container")
}
if err := retry.Expo(checkRunning(p), 15*time.Millisecond, 25*time.Second); err != nil {
excerpt := LogContainerDebug(p.OCIBinary, p.Name)
Will give minikube free quarter to instantiate all the resources we’d need in order to successfully launch the following:
$ podman run -d -t --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname minikube --name minikube --label created_by.minikube.sigs.k8s.io=true --label name.minikube.sigs.k8s.io=minikube --label role.minikube.sigs.k8s.io= --label mode.minikube.sigs.k8s.io=minikube --network minikube --ip 192.168.49.2 --volume minikube:/var:exec --memory=8000mb -e container=podman --expose 8443 --publish=127.0.0.1::8443 --publish=127.0.0.1::22 --publish=127.0.0.1::2376 --publish=127.0.0.1::5000 --publish=127.0.0.1::32443 gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1674164627-15541
Which launches.. But the resulting container immediately exists;
$ podman logs someImageID
shows us some more output:
+ userns=
+ grep -Eqv '0[[:space:]]+0[[:space:]]+4294967295' /proc/self/uid_map
+ userns=1
+ echo 'INFO: running in a user namespace (experimental)'
INFO: running in a user namespace (experimental)
+ validate_userns
+ [[ -z 1 ]]
+ local nofile_hard
++ ulimit -Hn
+ nofile_hard=1048576
+ local nofile_hard_expected=64000
+ [[ 1048576 -lt 64000 ]]
+ [[ -f /sys/fs/cgroup/cgroup.controllers ]]
+ for f in cpu memory pids
+ grep -qw cpu /sys/fs/cgroup/cgroup.controllers
+ echo 'ERROR: UserNS: cpu controller needs to be delegated'
ERROR: UserNS: cpu controller needs to be delegated
+ exit 1
This err message at the end got me throught a more or less related answer on stackoverflow; “I want to run rootless containers with podman” seems like what I’m trying to accomplish.
And in fact, repeating all the setup procedure for podman, to run the uppermentioned container as root, produced a whole different result
$ sudo podman volume create minikube --label name.minikube.sigs.k8s.io=minikube --label created_by.minikube.sigs.k8s.io=true
$ sudo podman network create --driver=bridge --subnet=192.168.49.0/24 --gateway=192.168.49.1 --label=created_by.minikube.sigs.k8s.io=true --label=name.minikube.sigs.k8s.io=minikube minikube
$ sudo podman run -it --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname minikube --label created_by.minikube.sigs.k8s.io=true --label name.minikube.sigs.k8s.io=minikube --label role.minikube.sigs.k8s.io= --label mode.minikube.sigs.k8s.io=minikube --network minikube --ip 192.168.49.2 --volume minikube:/var:exec --memory=8000mb -e container=podman --expose 8443 --publish=127.0.0.1::8443 --publish=127.0.0.1::22 --publish=127.0.0.1::2376 --publish=127.0.0.1::5000 --publish=127.0.0.1::32443 gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1674164627-15541 -- /bin/sh
[ OK ] Listening on D-Bus System Message Bus Socket.
Starting Docker Socket for the API.
Starting Podman API Socket.
[ OK ] Listening on Docker Socket for the API.
[ OK ] Listening on Podman API Socket.
[ OK ] Reached target Sockets.
[ OK ] Reached target Basic System.
Starting containerd container runtime...
[ OK ] Started D-Bus System Message Bus.
Starting minikube automount...
Starting OpenBSD Secure Shell server...
[ OK ] Finished minikube automount.
[ OK ] Started OpenBSD Secure Shell server.
[ OK ] Started containerd container runtime.
Starting Docker Application Container Engine...
[ OK ] Started Docker Application Container Engine.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Finished Update UTMP about System Runlevel Changes.
It’s systemd! We’re inside the container(we can’t do much tho..).
If we’re executing with its entrypoint – /usr/local/bin/entrypoint($ minikube inspect imageID) instead of going to bash, we obtain a working “node” container.
$ sudo podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b215ff45b340 gcr.io/k8s-minikube/kicbase-builds:v0.0.36-1674164627-15541 About a minute ago Up About a minute ago 127.0.0.1:39057->22/tcp, 127.0.0.1:46879->2376/tcp, 127.0.0.1:46527->5000/tcp, 127.0.0.1:35029->8443/tcp, 127.0.0.1:38423->32443/tcp minikube
The kicBase image
Ok now where does the kicBase image even come from?
The minikube site has this tutorial that explains how to build the .iso; it’s not exactly what we’re looking for, but I can see that it’s a make command… Could it be?
$ make help
Available targets for minikube v1.28.0
--------------------------------------
all Build all different minikube components
drivers Build Hyperkit and KVM2 drivers
cross Build minikube for all platform
exotic Build minikube for non-amd64 linux
retro Build minikube for legacy 32-bit linux
windows Build minikube for Windows 64bit
darwin Build minikube for Darwin 64bit
linux Build minikube for Linux 64bit
goimports Run goimports and list the files differs from goimport's
golint Run golint
gocyclo Run gocyclo (calculates cyclomatic complexities)
lint Run lint
lint-ci Run lint-ci
apt Generate apt package file
## Here it is...
local-kicbase Builds the kicbase image and tags it local/kicbase:latest and local/kicbase:$(KIC_VERSION)-$(COMMIT_SHORT)
local-kicbase-debug Builds a local kicbase image and switches source code to point to it
build-kic-base-image Build multi-arch local/kicbase:latest
push-kic-base-image Push multi-arch local/kicbase:latest to all remote registries
upload-preloaded-images-tar Upload the preloaded images for oldest supported, newest supported, and default kubernetes versions to GCS.
# Makefile
.PHONY: local-kicbase
local-kicbase: ## Builds the kicbase image and tags it local/kicbase:latest and local/kicbase:$(KIC_VERSION)-$(COMMIT_SHORT)
docker build -f ./deploy/kicbase/Dockerfile -t local/kicbase:$(KIC_VERSION) --build-arg VERSION_JSON=$(VERSION_JSON) --build-arg COMMIT_SHA=${VERSION}-$(COMMIT_NOQUOTES) --cache-from $(KICBASE_IMAGE_GCR) .
docker tag local/kicbase:$(KIC_VERSION) local/kicbase:latest
docker tag local/kicbase:$(KIC_VERSION) local/kicbase:$(KIC_VERSION)-$(COMMIT_SHORT)
nobody is gonna read this article at this point…
Had to apply a couple of little fixes in order to make make build-kic-base-image
work..
later a maintainer on slack told me to stick to make local-kicbase
instead:
the former was used to build the kicBase for multiple archs..
which I guess is not supported?
ERROR: docker exporter does not currently support exporting manifest lists
either local-kicbase.. or just comment out the extra archs from makefile.
But now we have an image:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
local/kicbase latest e099056031d5 19 minutes ago 1.15GB
local/kicbase v0.0.36-1674164627-15541 e099056031d5 19 minutes ago 1.15GB
local/kicbase v0.0.36-1674164627-15541-1784105c6 e099056031d5 19 minutes ago 1.15GB
and since we’re trying to make it work on podman, a docker_save/podman_load after..
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> e099056031d5 20 minutes ago 1.16 GB
good old imageID.
First thing is to create the resources with the previous podman network/volume create commands..
Then again, running the container works, but it immediately stops with same logs == we’re on the same track.
ERROR: UserNS: cpu controller needs to be delegated
This cpu controller really needs to be delegated; where is docker picking its stuff in order to build the image?
As the Makefile points out.. there’s a Dockerfile inside ./deply/kicbase; which is quite huge, so I’m not posting it.. only the relevant parts:
Number one:
# ./deploy/kicbase/Dockerfile
# ...
COPY --from=auto-pause /src/cmd/auto-pause/auto-pause-${TARGETARCH} /bin/auto-pause
# Install dependencies, first from apt, then from release tarballs.
# NOTE: we use one RUN to minimize layers. <--- Here
#
hmm.. we could (possibly?) use some buildah/Containerfile build types.. Dunno what the actual state of the art for docker builds is..(TODO)
Number two:
#./deploy/kicbase/Dockerfile
# First we must ensure that our util scripts are executable.
#
# The base image already has: ssh, apt, snapd, but we need to install more packages.
# Packages installed are broken down into (each on a line):
# - packages needed to run services (systemd)
# - packages needed for kubernetes components
# - packages needed by the container runtime
# - misc packages kind uses itself
# - packages that provide semi-core kubernetes functionality
# After installing packages we cleanup by:
# - removing unwanted systemd services
# - disabling kmsg in journald (these log entries would be confusing)
#
# Next we ensure the /etc/kubernetes/manifests directory exists. Normally
# a kubeadm debian / rpm package would ensure that this exists but we install
# freshly built binaries directly when we build the node image.
#
# Finally we adjust tempfiles cleanup to be 1 minute after "boot" instead of 15m
# This is plenty after we've done initial setup for a node, but before we are
# likely to try to export logs etc.
The whole workflow is documented. Wonderful! It’s exactly how it’s written.. a couple of RUNs for each piece so… Even if not configured to be used, all the pieces (docker/podman/containerd/crio/crun/…) are still inside the image.
Dumping some random RUNs:
# ./deploy/kicbase/Dockerfile
# install cri-o based on https://github.com/cri-o/cri-o/blob/release-1.24/README.md#installing-cri-o
RUN export ARCH=$(dpkg --print-architecture | sed 's/ppc64el/ppc64le/' | sed 's/armhf/arm-v7/') && \
if [ "$ARCH" != "ppc64le" ] && [ "$ARCH" != "arm-v7" ]; then sh -c "echo 'deb https://downloadcontent.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/${CRIO_VERSION}/xUbuntu_20.04/ /' > /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:${CRIO_VERSION}.list" && \
curl -LO https://downloadcontent.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/${CRIO_VERSION}/xUbuntu_20.04/Release.key && \
apt-key add - < Release.key && \
clean-install cri-o cri-o-runc; fi
# ./deploy/kicbase/Dockerfile
# Install cri-dockerd from pre-compiled binaries stored in GCS, this is way faster than building from source in multi-arch
RUN echo "Installing cri-dockerd" && \
curl -L "https://storage.googleapis.com/kicbase-artifacts/cri-dockerd/${CRI_DOCKERD_VERSION}/${TARGETARCH}/cri-dockerd" -o /usr/bin/cri-dockerd && chmod +x /usr/bin/cri-dockerd && \
curl -L "https://storage.googleapis.com/kicbase-artifacts/cri-dockerd/${CRI_DOCKERD_VERSION}/cri-docker.socket" -o /usr/lib/systemd/system/cri-docker.socket && \
curl -L "https://storage.googleapis.com/kicbase-artifacts/cri-dockerd/${CRI_DOCKERD_VERSION}/cri-docker.service" -o /usr/lib/systemd/system/cri-docker.service
make kicBase is slooooow…
What I’m doing now is to try to guess which thing is responsible for that ERROR: UserNS: cpu controller needs to be delegated
error,
which I didn’t ever see inside a container before..
The thing is that making the whole kicbase is a time consuming process.. But do we need the full 1.16G image?
I guess not.. so we’re stripping it down.
What I want to reproduce is the same ERROR output from the official kicBase.
Just by trying to get to a shell inside of it for now, then try to call the entrypoint.
With the full image, doing this
$ podman run -it officialKIcBase /bin/sh
is enough to trigger the error…
Just an sh?
What could’ve possibly gone wrong?
EDIT
Got rid of it removing ENTRYPOINT directive inside Dockerfile
I thought that appending the command at the end of podman run would override entrypoint..
I tried to strip everything excpet the systemd parts from the container, since it seems responsible for the error. I ended up with:
FROM golang:1.19.5 as auto-pause
WORKDIR /src
COPY pkg/ ./pkg
COPY cmd/ ./cmd
COPY deploy/addons ./deploy/addons
COPY translations/ ./translations
COPY third_party/ ./third_party
COPY go.mod go.sum ./
ARG TARGETARCH
ENV GOARCH=${TARGETARCH}
ARG PREBUILT_AUTO_PAUSE
RUN if [ "$PREBUILT_AUTO_PAUSE" != "true" ]; then cd ./cmd/auto-pause/ && go build -o auto-pause-${TARGETARCH}; fi
FROM ubuntu:focal-20221019 as kicbase
ARG BUILDKIT_VERSION="v0.11.0"
ARG FUSE_OVERLAYFS_VERSION="v1.7.1"
ARG CONTAINERD_FUSE_OVERLAYFS_VERSION="1.0.3"
ARG CRIO_VERSION="1.24"
ARG CRI_DOCKERD_VERSION="0de30fc57b659cf23b1212d6516e0cceab9c91d1"
ARG TARGETARCH
COPY deploy/kicbase/10-network-security.conf /etc/sysctl.d/10-network-security.conf
COPY deploy/kicbase/11-tcp-mtu-probing.conf /etc/sysctl.d/11-tcp-mtu-probing.conf
COPY deploy/kicbase/02-crio.conf /etc/crio/crio.conf.d/02-crio.conf
COPY deploy/kicbase/containerd.toml /etc/containerd/config.toml
COPY deploy/kicbase/containerd_docker_io_hosts.toml /etc/containerd/certs.d/docker.io/hosts.toml
COPY deploy/kicbase/clean-install /usr/local/bin/clean-install
COPY deploy/kicbase/entrypoint /usr/local/bin/entrypoint
COPY deploy/kicbase/CHANGELOG ./CHANGELOG
COPY --from=auto-pause /src/cmd/auto-pause/auto-pause-${TARGETARCH} /bin/auto-pause
RUN echo "Ensuring scripts are executable ..." \
&& chmod +x /usr/local/bin/clean-install /usr/local/bin/entrypoint \
&& echo "Installing Packages ..." \
&& DEBIAN_FRONTEND=noninteractive clean-install \
systemd \
conntrack iptables iproute2 ethtool socat util-linux mount ebtables udev kmod \
libseccomp2 pigz \
bash ca-certificates curl rsync \
nfs-common \
iputils-ping netcat-openbsd vim-tiny \
&& find /lib/systemd/system/sysinit.target.wants/ -name "systemd-tmpfiles-setup.service" -delete \
&& rm -f /lib/systemd/system/multi-user.target.wants/* \
&& rm -f /etc/systemd/system/*.wants/* \
&& rm -f /lib/systemd/system/local-fs.target.wants/* \
&& rm -f /lib/systemd/system/sockets.target.wants/*udev* \
&& rm -f /lib/systemd/system/sockets.target.wants/*initctl* \
&& rm -f /lib/systemd/system/basic.target.wants/* \
&& echo "ReadKMsg=no" >> /etc/systemd/journald.conf \
&& ln -s "$(which systemd)" /sbin/init \
&& echo "Ensuring /etc/kubernetes/manifests" \
&& mkdir -p /etc/kubernetes/manifests \
&& echo "Adjusting systemd-tmpfiles timer" \
&& echo "Disabling udev" \
&& systemctl disable udev.service \
&& echo "Modifying /etc/nsswitch.conf to prefer hosts" \
ENV container docker
STOPSIGNAL SIGRTMIN+3
ARG COMMIT_SHA
USER root
ARG VERSION_JSON
RUN echo "${VERSION_JSON}" > /version.json
COPY deploy/kicbase/automount/minikube-automount /usr/sbin/minikube-automount
COPY deploy/kicbase/automount/minikube-automount.service /usr/lib/systemd/system/minikube-automount.service
RUN ln -fs /usr/lib/systemd/system/minikube-automount.service \
/etc/systemd/system/multi-user.target.wants/minikube-automount.service
COPY deploy/kicbase/scheduled-stop/minikube-scheduled-stop /var/lib/minikube/scheduled-stop/minikube-scheduled-stop
COPY deploy/kicbase/scheduled-stop/minikube-scheduled-stop.service /usr/lib/systemd/system/minikube-scheduled-stop.service
RUN chmod +x /var/lib/minikube/scheduled-stop/minikube-scheduled-stop
RUN rm -rf \
/usr/share/doc/* \
/usr/share/man/* \
/usr/share/local/*
RUN echo "kic! Build: ${COMMIT_SHA} Time :$(date)" > "/kic.txt"
### then make local-kicbase builds it
### then docker_save/podman_load ....
Which doesn’t reproduce the error.. Which seems bogus, since everything else on that Dockerfile is just installing stuff..
I successfully got a shell inside the container.
Something as simple as # systemctl list-units
wouldn’t work, cause
root@a26f881f092a:/# systemctl list-units
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
Allright.. so something must be invoking systemd in some way somewhere..
What about the previous entrypoint?
# original Dockerfile @ ./deploy/kicbase/Dockerfile
#...
COPY deploy/kicbase/entrypoint /usr/local/bin/entrypoint
# NOTE: this is *only* for documentation, the entrypoint is overridden later
ENTRYPOINT [ "/usr/local/bin/entrypoint", "/sbin/init" ]
## dunno where it's overridden.. whatever..
Well.. dunno why I wasn’t able to override the entrypoint with the podman run command, but let’s try it out:
# from inside the container
$ /usr/local/bin/entrypoint
...
ERROR: UserNS: cpu controller needs to be delegated
+ exit 1
yep.. Here it is. Let’s look at what it’s doing..
At the top of the file there is already a set -x, which makes all the ‘+’/’++’ prefixed line of output appear.. We can start reading from there…
Those are the only lines that we need:
# deploy/kicbase/entrypoint
# If /proc/self/uid_map 4294967295 mappings, we are in the initial user namespace, i.e. the host.
# Otherwise we are in a non-initial user namespace.
# https://github.com/opencontainers/runc/blob/v1.0.0-rc92/libcontainer/system/linux.go#L109-L118
userns=""
if grep -Eqv "0[[:space:]]+0[[:space:]]+4294967295" /proc/self/uid_map; then
userns="1"
echo 'INFO: running in a user namespace (experimental)'
fi
# then a bunch of definitions..
validate_userns() {
if [[ -z "${userns}" ]]; then
return
fi
local nofile_hard
nofile_hard="$(ulimit -Hn)"
local nofile_hard_expected="64000"
if [[ "${nofile_hard}" -lt "${nofile_hard_expected}" ]]; then
echo "WARN: UserNS: expected RLIMIT_NOFILE to be at least ${nofile_hard_expected}, got ${nofile_hard}" >&2
fi
if [[ -f "/sys/fs/cgroup/cgroup.controllers" ]]; then
for f in cpu memory pids; do
if ! grep -qw $f /sys/fs/cgroup/cgroup.controllers; then
## [!] we're dying here:
echo "ERROR: UserNS: $f controller needs to be delegated" >&2
exit 1
fi
done
fi
}
# then other bunch of definitions..
# ultimately validate_userns gets called
# validate state
validate_userns
I thought there was some bug inside the grep or something..
There’s nothing wrong.. it was the original willing of the author(git blame him :).
Also.. if running a bash on the same container from sudo podman
or docker
, we’re getting past that line.
I have no idea what this means.
fastforward…
Ok now I have..
I was about to ask directly the one responsible for that line of code(he was online), but as happens to me a lot of times.. while I was writing the question.. I investigated further.. ’till I got the answer..
What happens is the following:
-
this article talks about the migration from cgroupsv1 to to cgroupsv2 and its status (at that time).. it also talks about podman and rootless containers and explains how to solve our issue(search for
podman run --cpus
), as well as why it’s an issue. -
why is that an issue.. more in detail (plus the link in the article is broken).
It actually was considered an issue with cgroupsv1.. but then with cgroupsv2 it became like running a less-rootless container..?
Still figuring.. -
the solution expanded, even tho at the end it says to reboot.. actually a
sudo systemctl daemon-reload
is enough for the delegation to take effect
Althought I admit that I rebooted to undelegate…
Also podman is proposing it -
one level deeper explaination of what’s happening
I was wondering why this is a thing.. why setting my own resources should imply more privileges than my own..
This kinda explains it..
"Because the resource control interface files in a given directory control the distribution of the parent’s resources, the delegatee shouldn’t be allowed to write to them"
But I’m not sure if I got it right…(TODO) -
doesn’t work under wsl as of august29/2022 dunno why I posted it.. stumbled upon it and it just seemed interesting(Dunno its current state, but I can close this tab now..)
So it’s not an issue at all.. it’s just extra care.
IT IS actually part of the minikube documentation,
in that “See the Rootless Docker” link..
dunno why I missed it(so many times).
..Luckily we have another issue:
The other issue…
Oh no, wait..
There was none.
The END.