Features
The following is a description of features that are commonly expected in Sentry SDKs.
Unified API
Make sure to also have read the unified API documentation which explains the common API design.
Events should be transmitted in a background thread or similar system. This queue must be flushed when the application shuts down with a specific timeout. This feature is typically user facing and explained as part of shutdown and draining.
Ability for the SDK to be set as a hook to record any uncaught exceptions. At the language level this is typically a global hook provided by the language itself. For framework integrations this might be part of middleware or some other system.
This behavior is typically provided by a default integration that can be disabled.
Scopes should be provided by SDKs to set common attributes and context data on events sent to Sentry emitted from the current scope. They should be inherited to lower scopes so that they can be set "globally" on startup. Note that some attributes can only be set in the client options (release
, environment
) and not on scopes.
What scope means depends on the application, for a web framework it is most likely a single request/response cycle. For a mobile application there is often just one single scope that represents the single user and their actions. Scoping can be difficult to implement because it often has to deal with threads or concurrency and can involve deep integration with frameworks. see the scopes page for more information.
Automatic addition of useful attributes such as tags
or extra
or specific contexts
. Typically means the SDK hooks into a framework so that it can set attributes that are known to be useful for most users. Please check Data Handling for considerations.
Manually record application events (into the current scope) during the lifecycle of an application. Implement a ring buffer so as not to grow indefinitely. The most recent breadcrumbs should be attached to events as they occur.
With deeper framework integration, the automatic recording of breadcrumbs is possible and recommended, for example:
- UI Events: button clicks, touch events, etc.
- System Events: low battery, low storage space, airplane mode started, memory warnings, device orientation changed, etc.
- Outgoing HTTP requests
Check out the complete breadcrumb documentation for more types.
SDKs should allow the user to configure what percentage of events are actually sent to the server (the rest should be silently ignored). For example:
sample_rate = options.get('sample_rate', 1.0)
# assuming random() returns a value between 0.0 (inclusive) and 1.0 (exclusive)
if random() < sample_rate:
transport.capture_event(event)
To further simplfiy ignoring certain events from being sent to sentry, it is also suggested to provide ignoreTransactions
and ignoreErrors
(or exception, choose terminology which is best for the platform). The array for ignoreTransactions
specifically should contain an array of transaction names, as is stored in the transcation event schema (ie GET /info
). These options should provide a simple way to allow users to discard events (ignore) from before they are sent to sentry. Prevents sending data which is is undesired and may consume quota or resources on the Sentry server.
ignore_transactions = ['GET /api/health','/api/v1/*']
Respect Sentry’s HTTP 429 Retry-After
header, or, if the SDK supports multiple payload types (e.g. errors and transactions), the X-Sentry-Rate-Limits
header. Outgoing SDK requests should be dropped during the backoff period.
See Rate Limiting for details.
Backend SDKs (typically used in server applications) should have backpressure management logic that dynamically downsamples transactions when the throughput in the system is too high.
See Backpressure Management for details.
Every SDK must implement a debug mode, which is disabled by default. Users can enable it by setting the option debug
to true
. If the debug mode is enabled, the SDK prints out useful debugging information for two main purposes:
- It helps users identify and fix SDK problems, such as misconfigurations or something going wrong with sending data to Sentry.
- It can help Sentry SDK engineers investigate issues in Sentry SDKs, which can be done while developing or by asking a user to enable the debug mode and share the debug logs.
As the log output can add unnecessary overhead, we advise users to refrain from enabling the debug mode in production. Still, SDKs may log fatal error messages even when the debug mode is disabled. When using debug mode extensively for the second use case, we recommend adding the diagnostic level because users could easily miss error or fatal log messages.
SDKs may offer an optional diagnostic level, which controls the verbosity of the debug mode. There are five different levels:
debug
: The most verbose modeinfo
: Informational messageswarning
: Warning that something might not be righterror
: Only SDK internal errors are printedfatal
: Only critical errors are printed
The default level can be debug or error depending on the SDK's usage of log messages. When the volume of log message is low, the default can be debug. When it is high, users might easily miss errors or fatal messages. In that case, the SDK should set the default level to error. This choice must be clearly documented in the user-facing docs.
So users can easily spot errors during development, the SDK can automatically enable debug mode when it reliably detects it's running in a debug build. When doing so, the SDK must communicate this in the docs and with a log message when initializing the SDK. When the user explicitly sets the debug
option to false
, the SDK must disable the debug mode.
Stack parsing can tell which frames should be identified as part of the user’s application (as opposed to part of the language, a library, or a framework), either automatically or by user configuration at startup, often declared as a package/module prefix.
Lines of source code to provide context in stack traces. This is easier in interpreted languages, may be hard or impossible in compiled ones.
Local variable names and values for each stack frame, where possible. Restrictions apply on some platforms, for example it may only be possible to collect the values of parameters passed into each function, or it may be completely impossible to collect this information at all.
This functionality should be gated behind the includeLocalVariables
option, which is true
by default.
An SDK may expose a Scope property for tracking feature flag evaluations. When the scope forks, it's important to clone the feature flags property. Leaking flag evaluations between threads could lead to inaccurate feature flag evaluation logs.
The Scope's flag property should have a capped capacity and should prefer recently-evaluated flags over less-recently-evaluated flags. The recommended data structure is a LRU-cache but it is not required so long as the data structure behaves similarly. Serious deviations from the behavior of an LRU-cache should be documented for your language.
The Scope's flag property should expose two methods: get/0
and set/2
. set/2
takes two arguments. The first argument is the name of the flag. It is of type string. The second argument is the evaluation result. It is of type boolean. set/2
should remove all entries from the LRU-cache which match the provided flag's name and append the new evaluation result to the end of the queue. get/0
accepts zero arguments. The get/0
method must return a list of serialized flag evaluation results in order of evaluation. Oldest values first, newest values last. See the Feature Flag Context protocol documentation for details.
Integrations automate the work of tracking feature flag evaluations and serializing them on error context. An integration should hook into third-party SDK and record feature flag evaluations using the current scope. For example, in Python an integration would call sentry_sdk.get_current_scope().flags.set(...)
on each flag evaluation.
An integration is also responsible for registering an "on error" hook with the Sentry SDK. When an error occurs the integration should request the current scope and serialize the flags property. For example, in Python an integration might define this callback function:
def flag_error_processor(event, exc_info):
scope = sentry_sdk.get_current_scope()
event["contexts"]["flags"] = {"values": scope.flags.get()}
return event
Turn compiled or obfuscated code/method names in stack traces back into the original. Desymbolication always requires Sentry backend support. Not necessary for many languages.
Ability to get the ID of the last event sent. Event IDs are useful for correlation, logging, customers rolling their own feedback forms, etc.
For all SDKs, it is strongly recommended to send the User Feedback
as an envelope item. Alternatively, the SDKs can use the User Feedback endpoint, which is not recommended.
On user-facing platforms such as mobile, desktop, or browser this means first-class support for requesting User Feedback when an error or crash occurs. To see some examples of the API check out the user-facing docs for Apple and Java.
On mobile and desktop, it is common to prompt the user for feedback after a crash happened on the previous run of the application. Therefore the SDKs should implement the onCrashedLastRun
callback on the options. This callback gets called shortly after the initialization of the SDK when the last program execution terminated with a crash. The SDK should execute the callback only once during the entire run of the program to avoid multiple callbacks if there are multiple crash events to send.
On backend platforms, SDKs should document how to use the last event ID to prompt the user for feedback themselves.
User Feedback class:
Envelope item:
Attachments are files stored alongside an event. To send an attachment, add it as an envelope item to the corresponding event.
We recommend implementing two types of attachments, one with a path and another with a byte array. If the programming language allows it, create one class with multiple constructors to keep things simple and guess the content type of the attachment via the filename.
The overload that takes a path
should consider:
- The SDK should read the file when an event gets captured and not when the user adds an attachment to the scope.
- If reading the attachment fails, the SDK should not drop the whole envelope, but just the attachment's envelope item.
- If the SDK is in debug mode log (
debug=true
) out errors to make debugging easier.
If the SDK supports transactions, the attachments should offer a flag addToTransactions
, that specifies if SDK adds the attachment to every transaction or not. The default should be false
.
Use the implementations of Java, Objective-C, or Python as a reference for the API.
Alongside the implementation of attachments, add maxAttachmentSize
to the options and set the default to 20 MiB. When converting an attachment to an envelope item, the SDK must discard items larger than the maxAttachmentSize
. Especially on SDKs with offline caching, typical on mobile, this is useful because attachments could quickly eat up the users' disk space. Furthermore, Relay has a maximum size for attachments, and we want to reduce unnecessary requests.
When the user opts-in, if technically possible, take a screenshot of the application during a crash or error and include it as an attachment to the envelope with the event.
This feature only applies to SDKs with a user interface, such as Mobile and Desktop. In some environments such as native iOS, taking a screenshot requires the UI thread and in the event of a crash, that might not be available. So inherently this feature will be a best effort solution. Also, some environments don't allow access to the UI or some features during a hard crash, iOS, for example, doesn't allow running Objective-C code after a signal break, therefore no hard crash screenshot capture will be possible. It's advised to provide this feature through a single option called attachScreenshot
. That's the preferred way but in platforms such as Flutter, a wrapping widget is required so documentation can point users to that instead of the suggested option name.
The feature is achieved by adding an attachment with:
- File name
screenshot.jpg
orscreenshot.png
- Subsequent screenshots in the same event should be named
screenshot-n
, where n is the screenshot number starting with 2
- Subsequent screenshots in the same event should be named
- Image size, if possible should stay below 2 MB but quality/size could be configurable
ContentType: image/jpg
orContentType: image/png
Whenever possible, avoid adding the attachment altogether if taking the screenshot fails. Alternatively, when streaming, it's possible the envelope header was already flushed through before the attempt to take the screenshot happens. In this case, a 0 byte attachment will be included. In that case, Sentry will not show a screenshot preview.
Hook called with the event (and on some platforms the hint) that allows the user to decide whether an event should be sent or not. This can also be used to further modify the event. This only works for error
events. For transactions
it is recommended to have beforeSendTransaction
implemented in SDKs.
Hook called with the breadcrumb (and on some platforms the hint) that allow the user to decide whether and how a breadcrumb should be sent.
Include a list of loaded libraries (and versions) when sending an event.
This feature is also known as 'Offline Caching'.
Write events to disk before attempting to send, so that they can be retried in the event of a temporary network failure. Needs to implement a cap on the number of stored events. This is mostly useful on mobile and desktop(e.g: laptop) apps, where stable connectivity is often not available.
It's important to note that retry is only considered in the event of a network failure. For example:
- Connection timeout
- DSN resolution failure
- Connection reset by peer
Other failures, like those caused by processing the file in the SDK itself, the payload should be discarded since those are likely to end up on an endless retry. If the event reached Sentry and a HTTP response status code was received, even in the event of a 500
response, the event should be discarded.
Consider having the SDK retry sending events once the device is back online, when such notification exists in the platform.
Once the device is back online, the SDK is likely going to empty its disk queue in a quick burst of requests. This can trigger different abuse filters in Sentry. To account for that, it's considered to add a small delay between cached event captures. A recommended value is 100 milliseconds.
If the SDK is being rate-limited, which causes the SDK to drop any event that reaches its HTTP transport, consider stop consuming the disk cache until the Retry-After
timeout is reached or the app restarts.
We recommend implementing this feature for mobile and desktop SDKs.
If the application crashes shortly after the init of the SDK, the SDK should provide a mechanism to guarantee transmission to Sentry. Ideally, SDKs could send the events in a separate process not impacted by the crashing application. With the limitations on mobile platforms, spawning an extra process only for sending envelopes is hard to achieve or impossible. The SDKs on these platforms send envelopes on a background thread to not block the UI thread or because they forbid network operations on the UI thread. A crash occurring shortly after the SDK init could lead to never reporting such crashes, keeping the users unaware of a critical bug.
When the app crashes, the SDK needs to check if it happens within two seconds after the SDK init. If it does, it needs to store that information on the disk. We recommend using a marker file, which the SDK checks on initialization. Suppose the SDK allows storing this information in another place to avoid creating an additional marker file and causing extra IO. In that case, the recommendation is to use such an approach to prevent additional IO. We accept the tradeoff of extra IO to be able to detect start-up crashes.
If the platform allows it, the SDK may call flush directly after the detected start-up crash occurs and before the application terminates. If the SDK can guarantee transmission to Sentry while crashing, it can skip creating a marker file and making a blocking flush call on the next initialization.
If the marker file exists upon the next SDK initialization, the SDK should clear the marker and block the init
execution up to five seconds, in order to flush out pending envelopes. If the timeout of five seconds is exceeded, the SDK should release the init
lock and continue flushing on a background thread.
While, ideally, the SDK should only flush out the crash event envelope, it is acceptable to call flush for all envelopes to reduce the complexity, as most of the time, there shouldn't be too many envelopes in the offline cache.
We decided against making this feature configurable. The only reason to disable it should be if the feature is broken; hence users can't disable it. The users can't modify the duration for detecting the start-up crashes, which is two seconds, and the flush duration, which is five seconds, because users usually don't know which values to pick so that we can choose the proper ones. We can always add these values later.
Ability to use an HTTP proxy. Often easy to implement using the existing HTTP client. This should be picked up from the system config if possible or explicit config in the client options.
Every HTTP client integration must exclude HTTP requests that match the configured DSN in the Options to exclude HTTP requests to Sentry.
Add a breadcrumb for each outgoing HTTP request after the request finishes:
- type:
http
- category:
http
- level
info
- response status code 2XX - 3XXwarning
- response status code 4XXerror
- response status code 5XX
- data (all fields are optional but recommended):
url
- The URL used in the HTTP requesthttp.request.method
- uppercase HTTP method, i.e: GET, HEADhttp.response.status_code
- Numeric status code such as200
or404
http.query
- The query part of the URLhttp.fragment
- The fragment part of the URI (Browser SDKs only)http.request.body.size
Size in byteshttp.response.body.size
Size in bytes
If Performance Monitoring is both supported by the SDK and enabled in the client application when the transaction is active a new Span
must be created around the HTTP request:
- operation:
http.client
- description:
$METHOD $url
(uppercase HTTP method), e.g.GET https://sentry.io
- HTTP requests must be enhanced with a
sentry-trace
HTTP header to support distributed tracing - HTTP requests must be enhanced with a
traceparent
HTTP header to support distributed tracing - HTTP requests must be enhanced with a
baggage
HTTP header to support dynamic sampling - span status must match HTTP response status code (see Span status to HTTP status code mapping)
- when network error occurs, span status must be set to
internal_error
- span data must follow the Span Data Conventions
The SDK automatically captures HTTP Client errors and sends them to sentry.io.
The HTTP Client integration should have 3 configuration options:
captureFailedRequests
defaults tofalse
due to PII reasons.- The SDK will only capture HTTP Client errors if it is enabled.
failedRequestStatusCodes
defaults to500 - 599
, this configuration option accepts aList
ofHttpStatusCodeRange
which is a range of HTTP status code ->min
tomax
or a singlestatus_code
.- The SDK will only capture HTTP Client errors if the HTTP Response status code is within the defined ranges in
failedRequestStatusCodes
. - If the language has a
Range
type, it should be used instead ofHttpStatusCodeRange
.
- The SDK will only capture HTTP Client errors if the HTTP Response status code is within the defined ranges in
failedRequestTargets
defaults to (.*
), this configuration option accepts aList
ofString
that may be Regular expressions as well, similar to tracePropagationTargets.- The SDK will only capture HTTP Client errors if the HTTP Request URL is a match for any of the
failedRequestsTargets
.
- The SDK will only capture HTTP Client errors if the HTTP Request URL is a match for any of the
- sensitive
headers
should only be set ifsendDefaultPii
is enabled, e.g.Cookie
andSet-Cookie
.
The HTTP Client integration should capture error events with the following properties:
The Request interface, see the Spec for details.
The Response context, see the Spec for details.
{
"contexts": {
"response": {
"type": "response",
"cookies": "PHPSESSID=298zf09hf012fh2; csrftoken=u32t4o3tb3gg43; _gat=1;",
"headers": {
"content-type": "text/html"
/// ...
},
"status_code": 500,
"body_size": 1000 // in bytes
}
}
}
The Exception Interface, see the Spec for details.
If the HTTP Client integration does not throw an exception for unsuccessful requests, you can create a synthetic exception following this Spec:
- Set the Exception Mechanism with a proper
type
such asSentryOkHttpInterceptor
. - Set the Stack Trace Interface with
snapshot=true
.HTTP Client Error with status code: $code
.
When capturing error events, pass the original Request
and Response
objects from the HTTP Client as hints
, so the users may filter out events in beforeSend
with the full context.
Automatically captured HTTP Client error events can be searchable and alertable with the http.url
and http.status_code
properties, learn more about it in the Searchable Properties docs.
As an example, see the OkHTTP Client integration for Android.
The GraphQL Client integrations should match the guidelines for HTTP Client Integrations with a few differences:
The failedRequestStatusCodes
parameter does not exist because GraphQL errors are not HTTP errors, so that the request can be errored even though the HTTP status code of the response is successful.
Instead, the error has to be captured if the GraphQL response contains an errors
array. This can be done by regexing the response body, e.g.:
val regex = "(?i)\"errors\"\\s*:\\s*\\[".toRegex()
// [body] is the stringified GraphQL response body
if (regex.containsMatchIn(body)) {
// captures the error
}
Additional fields for breadcrumbs:
- data (all fields are optional but recommended):
operation_name
- The GraphQL operation nameoperation_type
- The GraphQL operation type, i.e:query
,mutation
,subscription
operation_id
- The GraphQL operation ID
Required fields for the Request interface:
{
"request": {
"api_target": "graphql",
"data": {
"foo": "bar"
}
}
}
The data
field is a JSON object that contains the GraphQL request payload.
Required fields for the Response interface:
{
"contexts": {
"response": {
"data": {
"foo": "bar"
}
}
}
}
The data
field is a JSON object that contains the GraphQL response payload. Attaching request and response bodies should be guarded by sendDefaultPii
and/or another flag to opt-in (e.g. captureFailedRequests
).
Required fields for the Event interface:
The fingerprints
field should be set to ["$operationName", "$operationType", "$statusCode"]
.
{
"fingerprints": ["$operationName", "$operationType", "$statusCode"]
}
The GraphQL Performance integration should match the guidelines for GraphQL Client Integrations with a few differences:
The transaction's name should be set with the GraphQL operation name, if possible, otherwise fallback to something unique that makes sense, e.g. the canonical name of the actual/generated class.
The transaction's description should be set with the GraphQL operation name, operation type (query
, mutation
or subscription
) and status code, if possible.
The request.api_target
should be set with graphql
.
The request.data
should be set with the raw GraphQL request payload, if possible. This should be guarded by an opt-in flag, e.g. sendDefaultPii
.
The contexts.response.data
should be set with the raw GraphQL response payload, only if there were errors
. This should be guarded by an opt-in flag, e.g. sendDefaultPii
and maxResponseBodySize
.
Some frameworks may use a Stream
object for the response, in this case, the object can't be consumed twice, so the SDK should try check and clone the object, if possible.
Spans should be created for resolvers, if possible. These are sometimes also called data fetchers.
Spans should be created for data loaders, if possible.
The operation type should follow the Span Operation Conventions.
Extra (data
) attributes for transactions and/or spans, there are Span Data and OTel GraphQL conventions.
Instrumenting APM for GraphQL will depend on the instrumented GraphQL library, if there are available hooks for it, the SDK should use them, otherwise, the SDK could try to monkeypatch the library or instrument the transport layer using heuristics, for example, if the URL ends with graphql
, if there are HTTP Headers, etc.
If there are hooks available and the transport layer is also instrumented (e.g. Apollo Interceptors for GraphQL and Spring), the SDK should give preference to the layer that has more information and avoid creating duplicate transactions/spans, or merge the information, if possible.
Spring GraphQL has its own observation package.
GraphQL Java has its own instrumentation package.
Apollo GraphQL has its own tracing extensions, in this case it'd even be possible to create synthethic transactions and spans out of the tracing extension.
Changes in the product may be necessary, e.g. if request.api_target
is set to graphql
, the request.data
and contexts.response.data
should do syntax highlighting.
Performance issues can be created for GraphQL transactions and spans, for example, N+1, query complexity, etc.
GraphQL Performance for Clients is very similar to the implementation for Servers, the difference is that you'll create a span
instead of a transaction
.
Spans don't contain the request
and response
interfaces, but set the span description similarly to the transaction description.
Breadcrumbs should be added for each GraphQL operation (resolvers, data loaders, etc), if possible.
The Breadcrumb type
should be graphql
and the category
should be the operation type, otherwise graphql.operation
if not available.
Additional fields for breadcrumbs:
- data (all fields are optional but recommended):
operation_name
- The GraphQL operation nameoperation_type
- The GraphQL operation type, i.e:query
,mutation
,subscription
operation_id
- The GraphQL operation ID
Avoid setting the query
String as part of the data
field since the event can be dropped due to size limit.
In case more additional fields are needed, the data
field can be used to add more context, e.g. graphql.path
, graphql.field
, graphql.type
, etc.
The category
can also be adapted to its own type, e.g. graphql.resolver
, graphql.data_loader
, etc.
For resolvers or data fetchers a breadcrumb could have the following fields:
type
=graphql
category
=graphql.fetcher
path
- Path in the query, e.g.project/status
field
- Field being fetched, e.g.status
type
- Type being fetched, e.g.String
object_type
- Object type being fetched, e.g.Project
For data loaders a breadcrumb could have the following fields:
type
=graphql
category
=graphql.data_loader
keys
- Keys that should be loaded by the data loaderkey_type
- Type of the keyvalue_type
- Type of the valuename
- Name of the data loader
If there are hooks available and the transport layer is also instrumented (e.g. Apollo Interceptors for GraphQL and Spring), the SDK should give preference to the layer that has more information and avoid creating duplicate breadcrumbs, or merge the information, if possible.
Ability for the SDK to attach request body to events and triggered during the execution of request.
User should be able to set a configuration option maxRequestBodySize
to instruct SDK how big requests bodies should be attached. SDK controls what is an actual size in bytes for each option:
none
(default)small
-1000
bytesmedium
-10000
bytesalways
Some logging frameworks provide an option to set logging context. In Java this is called MDC (Mapped Diagnostic Context).
Users should be able to set a list of logging context entries in a configuration option contextTags
to tell the SDK to convert the entries to Sentry tags.
Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").