-
Notifications
You must be signed in to change notification settings - Fork 65
src debug command #731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src debug command #731
Conversation
keeping up
…turns without error
…ontaining logs, past logs, and manifests
This looks great! I wonder if this might be a good opportunity to remove the report a bug page entirely and ask admins to provide a dump using this new command instead? |
@bobheadxi Maybe we can synthesize the page and this command. I'd like to eventually add some more functionality to this command. In particular I'd love for it to include a dump of the prometheus data, with a supporting way to reconstruct our grafana interface from that data. I like the alerts that are listed In the site admin debug page though, and I'll def review those PRS for further Ideas here. Listing for later: |
Happy to chat about the alerts summary implementation if you need help/more context! It's been a looong time but I also find the "smart summary" concept behind it neat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaedalusG Great work 💯 ! this looks great 🔥 I think it would also be helpful to be able to see if there are any alerts being fired in the instance from the Prometheus data to be in the data dump. it would also be great to have a docs page we could share with our site admins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much like @ericfritz, I don't think there's anything here that blocks merge, other than maybe one missing error return. I've tried to call out places where I think Go conventions would normally lead to something slightly different, or where there might be some room for future improvements, but none of that requires fixes in this PR.
Great job conceiving and leading this!
type archiveFile struct { | ||
name string | ||
data []byte | ||
err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unusual in Go to bundle an error field into a structure, as Go tends to strongly prefer returning error
values as separate return parameters. Normally, the "constructor" would return something like *archiveFile, error
, and let callers handle err
that way, rather than having to introspect the returned *archiveFile
. This tends to look something like this in practice:
type myType struct {
derivedFoo any
derivedBar any
}
func NewMyType(foo any, bar any) (*MyType, error) {
derivedFoo, err := doSomethingWithFoo(foo)
if err != nil {
return nil, errors.Wrap(err, "could not derive foo")
}
derivedBar, err := doSomethingWithBar(bar)
if err != nil {
return nil, errors.Wrap(err, "could not derive bar")
}
return &MyType{
derivedFoo: derivedFoo,
derivedBar: derivedBar,
}, nil
}
Where we do tend to break this in Sourcegraph is where the error actually needs to persist across invocations — the most common case is where a GraphQL resolver does some sort of one time internal call to retrieve things from the database, but then might need to return the same error for each field that was requested in the GraphQL query.
You don't need to change this, but it's worth being aware of for future Go projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will investigate after merge, ty ty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LawnGnome FYI these values go through a channel which is why it's packaged as a struct in the first place. Does that change this advice?
Co-authored-by: Adam Harvey <[email protected]>
Co-authored-by: Adam Harvey <[email protected]>
Co-authored-by: Adam Harvey <[email protected]>
Co-authored-by: Adam Harvey <[email protected]>
Co-authored-by: Adam Harvey <[email protected]>
…into src-debugger
Small follow-up: #743 |
* setup debug command scaffold * added test kubectl events logging * debug: Create ZIP archive * debug: Fail if zip file already exists * debug: introduce directory structure in zip * added --all-naamespaces to get events command * populated with TODO tasks, add get pods output * setting up k8s get pods to enable grabbing logs * got pod names parsed to slice * pulling logs from all pods, TODO: handle subcontainers * add missing .txt string to log archive names * used json get pods out, refactored k8s logs into their own spot * refactor get logs into savek8sLogs * working on logs for previous pods, need to only write when command returns without error * archives prev-logs if exists * refactoring getPods to handler scope * refactored all kubectl call functions to be declared outside of handler * added pod manifests and describes * added some more TODOs * added a little error handling on archives * corrected some error handling * added validation on out flag, and setup for deployment flag * improved logic for deployment flag * refactored deployment script * added get PV and PVC, used out flag as baseDir for archive filestructure * changed archive filestructure to be oriented around pod directories containing logs, past logs, and manifests * cleaned up some finished TODOs, shortened selector values for -d flag * refactored kubectl archiving into -d flag switch * working get containers * Concurrency!! * made deploy level kubectl functions concurrent * figured out bug in getLogs caused by immediately invoked function and variable scope in for loop * all kubectl functions refactored to run as goroutines, bug present causing some zip writes to write empty files * added some prints for debugging * clean up some flag validating prints and comments * A bunch of improvements 1. Set open file limits in process to support this much concurrency. This could also be done in the shell with `ulimits -n 99999`. 2. Pass in `context.Context` so we cancel any pending go-routines when returning early. 3. Change getXXX signatures from returning a slice to a single *archiveFile (since we were never actually returning more than one file) 4. Check for error in the file archiving loop and abort if there's any. 5. Verbose logging support. * add todo for logging * fixed hardcoding 999999 in setOpenFileLimits * removed plural k8s func names, started work on docker commands * working log archival for docker containers * added docker inspect * added docker container stats * improved logic for get containers * cleanup * clarify assumptions * introduced debug sub commands kube, comp, and serv -- Ex: src debug kube -out=test * for some reason my .gitignore was ignoring these new files * changed flagset to use .StringVar rather than .String in flagSet package * correct flag typo in serv * fixed verbose output in kube command, changed -out flag to -o * switching to passanger, stating getConfig * ignore errors and get extsvc configs * add some comments from work with Tomas * added namespace flag for kube subcommand * added a little safety check to the kube command * added a semaphore to kube, added some text safty checks * added semphore to all RPC calls in archiveKube * added current-context logging before kube command execution * Atempting to debug error inconsistent --previous logs call * added network validation to comp subcommand * added a function to pull site config, cleaned up logging and usage funcs * moved sub-functions to relevant file, trial utility archiveFileFromCommand in src extsvc function * converted all kube commands to , renamed kube getContainers to getPods * refactored compose functions with archiveFileFromCommand * Made archiveFileFromCommand calls aesthetically pleasing, added docker ps, and get pods commands as a sort of index * changes all path package calls to path/filepath calls * emptied graveyard * return early from failed semaphore.Acquire rather than reading error * handle errors, remove direct comparison to booleans * cleaned up some more linter erros * refactor adding verify func * fixed final improperly handled error * use semaphore in debug comp * working errgroups in comp * refactor to errgroup in kube * refactored writer in archive functions * corrected all complex filepath calls using fmt.Sprintf() * filter docker ps output to appropriate network * removed setup debug as will as openfile limit adjustment * normalize site config * correctly process json * fleshed out serv command * Update .gitignore Co-authored-by: Eric Fritz <[email protected]> * Apply suggestions from code review Implementation of Eric's review suggestions; untested, requiring duplication in other files Co-authored-by: Eric Fritz <[email protected]> * implement and repair infered refactors in erics suggestions * addressed many suggested code improvements for readability and style; still needs run through for error handling, and potentially some more refactors * refactor baseDir processor * went over errors, still probably needs another run since I widely used errors.Wrapf * clarify all string serializations * Update cmd/src/debug_comp.go Co-authored-by: Eric Fritz <[email protected]> * Update cmd/src/debug_common.go Co-authored-by: Eric Fritz <[email protected]> * Update cmd/src/debug_common.go Co-authored-by: Eric Fritz <[email protected]> * Update cmd/src/debug_kube.go Co-authored-by: Eric Fritz <[email protected]> * Update cmd/src/debug_kube.go Co-authored-by: Eric Fritz <[email protected]> * Update cmd/src/debug_kube.go Co-authored-by: Eric Fritz <[email protected]> * finishing touches * final error handling cleanup * add changelog entry * Update cmd/src/debug_serv.go Co-authored-by: Adam Harvey <[email protected]> * Update cmd/src/debug_comp.go Co-authored-by: Adam Harvey <[email protected]> * Update cmd/src/debug_kube.go Co-authored-by: Adam Harvey <[email protected]> * Update cmd/src/debug_kube.go Co-authored-by: Adam Harvey <[email protected]> * Update cmd/src/debug.go Co-authored-by: Adam Harvey <[email protected]> * address harvey suggestions * finalize changes from harvey's suggestions * fix go.mod * really fix go.mod Co-authored-by: Tomás Senart <[email protected]> Co-authored-by: Tomás Senart <[email protected]> Co-authored-by: Eric Fritz <[email protected]> Co-authored-by: Adam Harvey <[email protected]>
Src Debug
src debug
is a little pet project I've been working on for a long time now as a way to learn golang. The intention here is to create a cli command that collects the data from common diagnostic commands to speed up debugging and make sharing information about a Sourcegraph instance easier for our support team.The command is divided into three separate sub commands (
server
,compose
, andkube
) each specific to their deployment type and with some unique flags and values.Here is an example of the return from a
src debug server
command:This is the simplest version of the command and output
For an abridged example of kubernetes --
... after unzip
Here you can see the general directory structure outputted in the zip archive and get an idea about the data captured by the command.
Test plan
I'd love to get some testing from friends here 🙏
I tested each command with every iteration against the two AER test environments cse-aws (the command was tested against a local compose instance for
docker
cli outputs) and cse-k8s. These instances are fairly small so it might be a good idea to test out thekube
command against a much larger instance to test out the semaphore throttling, I don't havekubectl
access to.com
but that would be a good test ground.Testing amounted to executing the command and checking to make sure that expected outputs were received and written to a zip file.
To run the command checkout the
src-debugger
and run the command from there:Note that the command makes use of your
SRC_ENDPOINT
to gathersiteconfig.json
and external service config json. It also uses your local machine kubectl context to target the Sourcegraph kubernetes cluster, and a host machines docker cli.Feel free to hit me here or DM me with any questions!