Uncertainty-Aware and Reliable Neural MIMO Receivers via

Uncertainty-Aware and Reliable Neural MIMO

Receivers via Modular Bayesian Deep Learning

Tomer Raviv, Sangwoo Park, Osvaldo Simeone, and Nir Shlezinger

Abstract—Deep learning is envisioned to play a key role in

the design of future wireless receivers. A popular approach

to design learning-aided receivers combines deep neural net-

works (DNNs) with traditional model-based receiver algorithms,

realizing hybrid model-based data-driven architectures. Such

architectures typically include multiple modules, each carrying

out a different functionality dictated by the model-based receiver

workﬂow. Conventionally trained DNN-based modules are known

to produce poorly calibrated, typically overconﬁdent, decisions.

Consequently, incorrect decisions may propagate through the ar-

chitecture without any indication of their insufﬁcient accuracy. To

address this problem, we present a novel combination of Bayesian

deep learning with hybrid model-based data-driven architectures

for wireless receiver design. The proposed methodology, referred

to as modular Bayesian deep learning, is designed to yield

calibrated modules, which in turn improves both accuracy and

calibration of the overall receiver. We specialize this approach

for two fundamental tasks in multiple-input multiple-output

(MIMO) receivers – equalization and decoding. In the presence

of scarce data, the ability of modular Bayesian deep learning

to produce reliable uncertainty measures is consistently shown

to directly translate into improved performance of the overall

MIMO receiver chain.

I. INTRODUCTION

Deep neural networks (DNNs) were shown to successfully

learn complex tasks from data in domains such as computer

vision and natural language processing. This dramatic success

has led to a burgeoning interest in the design of DNN-

aided communications receivers [2]–[6]. While traditional re-

ceiver algorithms are based on the mathematical modelling of

signal transmission, propagation, and reception, DNN-based

receivers learn their mapping from data without relying on

such accurate modelling. Consequently, DNN-aided receivers

can operate efﬁciently in scenarios where the channel model

is unknown, highly complex, or difﬁcult to optimize [7]. The

incorporation of deep learning into receiver processing is thus

often envisioned to be an important contributor to meeting the

rapidly increasing demands of wireless systems in throughput,

coverage, and robustness [8].

Parts of this work were presented at the 2023 IEEE International Conference

on Communications (ICC) as the paper [1]. T. Raviv and N. Shlezinger are

with the School of ECE, Ben-Gurion University of the Negev, Beer-Sheva,

Israel (e-mail: [email protected], [email protected]). S. Park and O.

Simeone are with King’s Communication Learning & Information Processing

(KCLIP) lab, Centre for Intelligent Information Processing Systems (CIIPS),

at the Department of Engineering, King’s College London, U.K. (email:

{sangwoo.park; osvaldo.simeone}@kcl.ac.uk). The work of T. Raviv and

N. Shlezinger was partially supported by the Israeli Innovation Authority.

The work of O. Simeone was partially supported by the European Union’s

Horizon Europe project CENTRIC (101096379), by an Open Fellowship of

the EPSRC (EP/W024101/1), by the EPSRC project (EP/X011852/1), and

by Project REASON, a UK Government funded project under the Future

Open Networks Research Challenge (FONRC) sponsored by the Department

of Science Innovation and Technology (DSIT).

Despite their promise in facilitating physical layer com-

munications, particularly or multiple-input multiple-output

(MIMO) systems, the deployment of DNNs faces several

challenges that may limit their direct applicability to wireless

receivers. The dynamic nature of wireless channels combined

with the limited computational and memory resources of wire-

less devices indicates that MIMO receiver processing is fun-

damentally distinct from conventional deep learning domains,

such as computer vision and natural language processing [9],

[10]. Speciﬁcally, the necessity for online training may arise

due to the time-varying settings of wireless communications.

Nonetheless, pilot data corresponding to new channels is

usually scarce. DNNs trained using conventional deep learn-

ing mechanisms, i.e., via frequentist learning, are prone to

overﬁtting in such settings [11]. Moreover, they are likely to

yield poorly calibrated outputs [12], [13], e.g., produce wrong

decisions with high conﬁdence.

A candidate approach to design DNN-aided receivers that

can be trained with limited data incorporates model-based

receiver algorithms as a form of inductive bias into a hy-

brid model-based data-driven architecture [14]–[16]. One

such model-based deep learning methodology is deep unfold-

ing [17]. Deep unfolding converts an iterative optimizer into a

discriminative machine learning system by introducing train-

able parameters within each of a ﬁxed number of iterations

[16], [18]. Deep unfolded architectures treat each iteration

as a trainable module, obtaining a modular and interpretable

DNN architecture [19], with each module carrying out a

speciﬁc functionality within a conventional receiver processing

algorithm. Modularity is informed by domain knowledge,

potentially enhancing the generalization capability of DNN-

based receivers [20]–[25].

Hybrid model-based data-driven methods require less data

to reliably train compared to black box architectures. Yet, the

resulting system may be negatively affected by the inclusion

of conventionally trained DNN-based modules, i.e., DNNs

trained via frequentist learning [13]. This follows from the

frequent overconﬁdent decisions that are often produced by

such conventional DNNs, particularly when learning from

limited data sets. Such poorly calibrated DNN-based modules

may propagate inaccurate predictions through the architecture

without any indication of their insufﬁcient accuracy.

The machine learning framework of Bayesian learning

accommodates learning techniques with improved calibra-

tion compared with frequentist learning [13], [26]–[28]. In

Bayesian deep learning, the parameters of a DNN are treated

as random variables. Doing so provides means to quantify

the epistemic uncertainty, which arises due to lack of training

data. Bayesian deep learning was shown to improve the

reliability of DNNs in wireless communication systems [29],

arXiv:2302.02436v3 [cs.IT] 14 Mar 2024

Fig. 1: The communication system studied in this paper.

[30]. Nonetheless, these existing studies adopt black-box DNN

architectures. Thus, the existing approaches do not leverage

the modularity of model-based data-driven receivers, and do

not account for the ability to assign interpretable meaning to

intermediate features exchanged in model-based deep learning

architectures.

In this work we present a novel methodology, referred to

as modular Bayesian deep learning, that combines Bayesian

learning with hybrid model-based data-driven architectures for

DNN-aided receivers. The proposed approach calibrates the

internal DNN-based modules, rather than merely calibrating

the overall system output. This has the advantage of yielding

well-calibrated soft outputs that can beneﬁt downstream tasks,

as we empirically demonstrate.

Our main contributions are summarized as follows.

• Modular Bayesian deep learning: We introduce a mod-

ular Bayesian deep learning framework that integrates

Bayesian learning with model-based deep learning. Ac-

counting for the modular architecture of hybrid model-

based data-driven receivers, Bayesian learning is applied

on a per-module basis, enhancing the calibration of soft

estimates exchanged between internal DNN modules.

The training procedure is based on dedicated sequential

Bayesian learning operation, which boosts calibration of

internal modules while encouraging subsequent modules

to exploit these calibrated features.

• Modular Bayesian deep MIMO receivers: We special-

ize the proposed methodology to MIMO receivers. To this

end, we adopt two established DNN-aided architectures

– the DeepSIC MIMO equalizer of [21] and the deep

unfolded weighted belief propagation (WBP) soft decoder

of [31] – and derive their design as modular Bayesian

deep learning systems.

• Extensive experimentation: We evaluate the advantages

of modular Bayesian learning for deep MIMO receivers

based on DeepSIC and WBP. Our evaluation ﬁrst con-

siders equalization and decoding separately, followed by

an evaluation of the overall receiver chain composed of

the cascade of equalization and decoding. The proposed

Bayesian modular approach is systematically shown to

outperform conventional frequentist learning, as well as

traditional Bayesian black-box learning, while being re-

liable and interpretable,

The remainder of this paper is organized as follows: Sec-

tion II formulates the considered MIMO system model, and re-

views some necessary preliminaries on Bayesian learning and

model-based deep receivers. Section III presents the proposed

methodology of modular Bayesian learning, and specializes it

for MIMO equalization using DeepSIC and for decoding using

the WBP. Our extensive numerical experiments are reported

in Section IV, while Section V provides concluding remarks.

Throughout the paper, we use boldface letters for multivari-

ate quantities, e.g., x is a vector; calligraphic letters, such as

X for sets, with |X | being the cardinality of X ; and R for the

set of real numbers. The notation (·)

stands for the transpose

operation.

II. SYSTEM MODEL AND PRELIMINARIES

In this section, we describe the MIMO communication

system under study in Subsection II-A, and review the con-

ventional frequentist learning of MIMO receivers in Subsec-

tion II-B. Then, we brieﬂy present basics of model-based

learning for deep receivers in Subsection II-C, and review our

main running examples of DeepSIC [21] and WBP [31].

A. System Model

We consider an uplink MIMO digital communication system

with K single-antenna users. The users transmit encoded mes-

sages, independently of each other, to a base station equipped

with N antennas. Each user of index k uses a linear block

error-correction code to encode a length m binary message

∈ {0, 1}

. Encoding is carried out by multiplication with

a generator matrix G, resulting in a length C ≥ m codeword

given by c

= G

∈ {0, 1}

. The same codebook is em-

ployed for all users. The messages are modulated, such that

each symbol s

[i] at the ith time index is a member of the

set of constellation points S. The transmitted symbols vector

form the K × 1 vector s[i] = [s

[i] . . . s

[i]] ∈ S

We adopt a block-fading model, i.e., the channel remains

constant within a block of duration dictated by the coherence

time of the channel. A single block of duration T

tran

generally divided into two parts: the pilot part, composed of

pilot

pilots, and the message part of T

info

= T

tran

− T

pilot

information symbols. The corresponding indices for transmit-

ted pilot symbols and information symbols are written as

pilot

= {1, ..., T

pilot

}, and T

info

= {T

pilot

+ 1, ..., T

tran

respectively.

To accommodate complex and non-linear channel models,

we represent the channel input-output relationship by a generic

conditional distribution. Accordingly, the corresponding re-

ceived signal vector y[i] ∈ R

is determined as

y[i] ∼ P

(y[i]|s[i]), (1)

which is subject to the unknown conditional distribution

(·|·) that depends on the current channel parameters h.

The received channel outputs in (1) are processed by the

receiver, whose goal is to recover correctly each message.

This estimate of the kth user message is denoted by

. This

mapping can be generally divided into two different tasks: (i)

equalization and (ii) decoding. Equalization maps the received

signal y[i] as an input into a vector of soft symbols estimates

for each user, denoted

[i], . . . ,

[i]. Decoding is done

for each user separately, obtaining an estimation

of the

transmitted message from the soft probability vectors {

[i]}

corresponding to the codeword. The latter often involves ﬁrst

obtaining a soft estimate for each bit in the message, where

we use

[v] to denote the estimated probability that the vth

bit in the kth message is one. The entire communication chain

is illustrated in Fig. 1.

B. Frequentist Deep Receivers

Deep receivers employ DNNs to map the block of channel

output into the transmitted message, i.e., as part of the equal-

ization and/or decoding operations. In conventional frequentist

learning, optimization of the parameter vector φ dictating the

receiver mappings is typically done by minimizing the cross-

entropy loss using a labelled pair of inputs and outputs [3],

i.e.,

freq

= arg min

(φ). (2)

For example, to optimize an equalizer (formulated here for a

single user case for brevity), one employs T

pilot

pilot symbols

and their corresponding observed channel outputs to compute

(φ) = −

pilot

i∈T

pilot

log P

(s[i]|y[i]). (3)

In (3), P

(s[i]|y[i]) is the soft estimate associated with the

symbol s[i] computed from y[i] with a DNN-aided equalizer

with parameters φ. When such an equalizer is used, the

stacking of {P

(s|y[i])}

s∈S

forms the soft estimate

P [i].

Calibration is based on the reliability of its soft estimates.

In particular, a prediction model P

(s[i]|y[i]) as in the above

DNN-aided equalizer example is perfectly calibrated if the

conditional distribution of the transmitted symbols given the

predictor’s output satisﬁes the equality [12]

P (s[i] =

s[i]|P

(

s[i]|y[i]) = π) = π for all π ∈ [0, 1]. (4)

When (4) holds, then the soft outputs of the prediction model

coincide with the true underlying conditional distribution.

C. Hybrid Model-Based Data-Driven Architectures

As mentioned in Section I, two representative examples

of hybrid model-based data-driven architectures employed by

deep receivers are: (i) the DeepSIC equalizer of [21]; and

(ii) the unfolded WBP introduced in [31], [32] for channel

Algorithm 1: DeepSIC Inference

Input: Channel output y[i]; Parameter vector φ

SIC

Output: Soft estimations for all users {

[i]}

k=1

1 DeepSIC Inference

2 Generate an initial guess {

(k,0)

}

k=1

3 for q ∈ {1, ..., Q} do

4 for k ∈ {1, ..., K} do

5 Calculate

(k,q)

by (5).

6 return {

[i] =

(k,Q)

}

k=1

decoding. As these architectures are employed as our main

running examples in the sequel, we next brieﬂy recall their

formuation.

1) DeepSIC: The DeepSIC architecture proposed in [21]

is a modular model-based DNN-aided MIMO demodulator,

based on the classic soft interference cancellation (SIC) al-

gorithm [33]. SIC operates in Q iterations. This equalization

algorithm reﬁnes an estimate of the conditional probability

mass function for the constellation points in S denoted by

(k,q)

, for each symbol transmitted by a user of index k,

throughout the iterations q = 1, . . . , Q. The estimate is

generated for every user k ∈ {1, . . . , K} and every iteration

q ∈ {1, . . . , Q} by using the soft estimates {

(l,q−1)

}

l̸=k

the interfering symbols {s

[i]}

l̸=k

obtained in the previous

iteration q − 1 as well as the channel output y[i]. This

procedure is illustrated in Fig. 2(a).

In DeepSIC, interference cancellation and soft detection

steps are implemented with DNNs. Speciﬁcally, the soft esti-

mate of the symbol transmitted by kth user in the qth iteration

is calculated by a DNN-based classifying module with param-

eter vector φ

(k,q)

. Accordingly, the |S| × 1 probability mass

function vector for user k at iteration q under time index i can

be obtained as

(k,q)

(s|y[i], {

(l,q−1)

[i]}

l̸=k

)

s∈S

, (5)

where DeepSIC is parameterized by the set of model parameter

vectors φ

SIC

= {{φ

(k,q)

}

k=1

}

q=1

. The entire DeepSIC archi-

tecture is illustrated in Fig. 2(b) and its inference procedure

is summarized in Algorithm 1.

In the original formulation in [21], DeepSIC is trained in a

frequentist manner, i.e., by minimizing the cross-entropy loss

(3). By treating each user k = 1, . . . , K equally, the resulting

loss is given by

SIC

(φ) =

k=1

MSIC

(φ

(k,Q)

), (6)

where the user-wise cross-entropy is evaluated over the pilot

set T

pilot

MSIC

(φ

(k,q)

) = −

pilot

i∈T

pilot

log P

(k,q)

[i]|y[i], {

(l,q−1)

[i]}

l̸=k

). (7)

In [21], the soft estimates for each user of index k were

obtained via the ﬁnal soft output, i.e.,

[i] is obtained as

Fig. 2: (a) Iterative SIC; and (b) DeepSIC [21]. In this paper,

we introduce modular Bayesian learning, which calibrates the

internal modules of the DeepSIC architecture with the double

goal of improving end-to-end accuracy and of producing well-

calibrated soft outputs for the beneﬁt of downstream tasks.

(k,Q)

. It was shown in [21] that under such setting, DeepSIC

can outperform frequentist black-box deep receivers under

limited pilots scenarios. However, its calibration performance

is not discussed.

2) Weighted Belief Propagation: Belief propagation (BP)

[34] is an efﬁcient algorithm for calculating the marginal

probabilities of a set of random variables whose relationship

can be represented as a graph. A graphical representation used

in channel coding is the Tanner graph of linear error-correcting

codes [35]. A Tanner graph is a bipartite graph with two types

of nodes: variable nodes and check nodes. The variable nodes

correspond to bits of the encoded message, and the check

nodes represent constraints that the message must satisfy.

In the presence of loops in the graph, BP is generally

suboptimal. Its performance can be improved by weighting

the messages via a learnable variant of the standard BP, called

WBP [31], [32]. In WBP, the messages passed between nodes

in the graphical model are weighted based on the relative soft-

information they represent. These weights are learned from

data by unfolding the message passing iterations of BP into

a machine learning architecture. In particular, the message

passing algorithm operates by iteratively passing messages

over edges from variables nodes to check nodes and vice versa.

As the decoder is applied to each user seperately, we omit

the user index k in the following. To formulate the decoder,

let the WBP message from variable node v to check node h

at iteration q be denoted by M

(q )

v →h

, and from check node h

to variable node v by M

(q )

h→v

. The messages are initialized

(q )

v →h

= M

(q )

h→v

= 0 for all (v, h) ∈ H, where H denotes

all valid variable-check connections. For instance, validity is

determined for linear block codes by the element in the vth

row and hth column in the parity check matrix. The input log

likelihood ratio (LLR) values to the decoder are initialized

from the outputs of the detection phase.

Algorithm 2: WBP Inference

Input: LLR values ℓ; Parameter vector φ

WBP

Output: Soft estimations for message bits

1 WBP Inference

2 Initialize M

(0)

v →h

= M

(0)

h→v

= 0.

3 for q ∈ {1, ..., Q} do

4 for (v, h) ∈ H do

5 Update M

(q )

v →h

by (8).

6 for (v, h) ∈ H do

7 Update M

(q )

h→v

by (9).

8 Calculate

L by (10) with q = Q.

9 return

To this end, one ﬁrst ﬁnds the soft estimate

c ∈ R

the codeword c. Given the ith soft symbol probability

P [i],

the soft estimate

c can be found by translating

P [i] to r =

log

|S| soft bits probabilities {

c[i · r], ...,

c[i · r + r − 1]} with

c[v] = P (c[v] = 1|

P [i]) for v = i · r, ..., i · r + r − 1. Then

the LLRs are computed by

ℓ[v] = log

c[v]

1 −

c[v]

and we use the vector ℓ to denote the stacking of ℓ[v] for all

bit indices v corresponding to a given codeword.

After initialization, the iterative algorithm begins. Variable-

to-check messages are updated according to the rule. i.e.,

(q )

v →h

= tanh



(ℓ[v] +

(v ,h

′

)∈H

′

̸=h

(q )

′

→v

(q −1)

′

→v

)



, (8)

with the parameters φ

(q )

= {W

(q )

}, where W

(q )

is a

weights’ matrix. The check-to-variable messages are updated

by a non-weighted rule

(q )

h→v

= 2 arctanh



′

,h)∈H

′

̸=v

(q )

′

→h



. (9)

The bit-wise probabilities are obtained from the messages via

(q )

[v] = tanh



(ℓ[v] +

(v ,h

′

)∈H

(q )

′

→v

(q −1)

′

→v

)



(10)

The update rules are performed on every combination of

v, h for Q iterations. Finally, the codeword is estimated as

the hard-decision on the probability vector

L, which is the

stacking of the bit-wise probabilities

(Q)

[v] for every v in

the codeword. The entire inference procedure for a given

WBP paramterization φ

WBP

= {φ

(q )

}

q=1

is summarized in

Algorithm 2.

The parameters of WBP, φ

WBP

, are learned from data.

Speciﬁcally, the decoder is trained to minimize the following

cross-entropy multi-loss (3) over the pilots set C

pilot

{1, . . . , r ∗ T

pilot

}

WBP

(φ) =

q =1

MWBP

(φ

(q )

), (11)

Algorithm 3: Bayesian DeepSIC

Input: Channel output y[i]; Distribution over entire

architecture p

(φ

SIC

); Ensemble size J.

Output: Soft estimations for all users {

[i]}

k=1

1 Bayesian Inference

2 for j ∈ {1, ..., J} do

3 Generate φ

SIC

∼ p

(φ

SIC

);

4 {

k,j

[i]}

k=1

← DeepSIC (y[i], φ

SIC

);

5 return {

[i] =

j=1

k,j

[i]}

k=1

with the cross-entropy loss at iteration q deﬁned as

MWBP

(φ

(q )

) =

−

pilot

v ∈C

pilot

log P

(q)

(m[v]|ℓ[v], {M

(q −1)

′

→v

}). (12)

In (12), P

(q)

(m[v]|ℓ[v], {M

(q −1)

′

→v

}) is the binary probability

vector computed for the vth bit of the message, i.e., m[v].

III. MODULAR BAYESIAN DEEP LEARNING

Bayesian learning is known to improve the calibration

of black-box DNNs models by preventing overﬁtting [29],

[36]. In this section, we fuse Bayesian learning with hybrid

model-based data-driven deep receivers. To this end, we ﬁrst

present the direct application of Bayesian learning to deep

receivers in Subsection III-A, and illustrate the approach for

DeepSIC and WBP in Subsection III-B. Then, we introduce

the proposed modular Bayesian framework, and specialize it

again to DeepSIC and WBP in Subsection III-C. We conclude

with a discussion presented in Subsection III-D.

A. Bayesian Learning

Bayesian learning treats the parameter vector φ of a ma-

chine learning architecture as a random vector. This ran-

dom vector obeys a distribution p

(φ) that accounts for the

available data and for prior knowledge. Most commonly, the

distribution is obtained by minimizing the free energy loss,

also known as negative evidence lower bound (ELBO) [13],

via

(φ) = arg min

′



φ∼p

′

(φ)

(φ)]

KL(p

′

(φ)||p

prior

(φ))



. (13)

The Kullback-Leibler (KL) divergence term in (13), given by



′

(φ)||p(φ)



≜ E

′

(φ)

log



′

(φ)

p(φ)



regularizes the optimization of p

(φ) with a prior distribution

prior

(φ). The hyperparameter β > 0, known as the inverse

temperature parameter, balances the regularization impact

[11], [13].

Algorithm 4: Bayesian WBP

Input: LLR values ℓ; Distribution over entire

architecture p

(φ

WBP

); Ensemble size J.

Output: Soft estimations for message bits

1 Bayesian Inference

2 for j ∈ {1, ..., J} do

3 Generate φ

WBP

∼ p

(φ

WBP

);

← WBP (ℓ, φ

WBP

);

5 return

L =

j=1

Once optimization based on (13) is carried out, Bayesian

learning aims at having the output conditional distribution

approximate the stochastic expectation

= E

φ∼p

(φ)

]. (14)

The expected value in (14) is approximated by ensemble

decision, namely, combining multiple predictions, with each

decision based on a different realization of φ sampled from

the optimized distribution p

(φ). Bayesian prediction (14)

can account for the epistemic uncertainty residing in the

optimal model parameters. This way, it can generally obtain

a better calibration performance as compared to frequentist

learning [29], [30].

B. Conventional Bayesian Learning for Deep Receivers

The conventional application of Bayesian learning includes

obtaining the distribution over the set of model parameters

vectors p

(φ). Afterwards, one can proceed to inference,

where an ensembling over multiple realizations of parameters

vectors, drawn from p

(φ), leads to more reliable estimate.

Below, we specialize this case for our main running examples

of deep receiver architectures: DeepSIC and WBP.

1) DeepSIC: The DeepSIC equalizer is formulated in Al-

gorithm 1 for a deterministic (frequentist) parameter vector.

When conventional Bayesian learning is employed, one ob-

tains a distribution p

(φ

SIC

) over the set of model parameter

vectors φ

SIC

by minimizing the free energy loss (13). Using

these distribution, the prediction for information symbols is

computed as

[i]|y[i]) = E

SIC

∼p

(φ

SIC

)



SIC

[i]|y[i])



, (15)

for each data symbol at i ∈ T

info

, with soft prediction

following (5). The resulting inference procedure is described

in Algorithm 3.

While the ﬁnal prediction P

SIC

[i]|y[i]) is obtained

by ensembling the predictions of multiple parameter vectors

SIC

∼ p

(φ

SIC

), the intermediate predictions made during

SIC iterations, i.e.,

(k,q)

with q < Q, are not combined to

approach a stochastic expectation. Thus, this form of Bayesian

learning only encourages the outputs, i.e., the predictions at

iteration Q, to be well calibrated, not accounting for the inter-

nal modular structure of the architecture. The soft estimates

learned by the internal modules may still be overconﬁdent,

propagating inaccurate decisions to subsequent modules.

Algorithm 5: Modular Bayesian DeepSIC

Input: Channel output y[i]; Distributions per module

{{p

(φ

(k,q)

)}

k=1

}

q =1

; Ensemble size J.

Output: Soft estimations for all users {

[i]}

k=1

1 Modular Bayesian Inference

2 Generate an initial guess {

(k,0)

}

k=1

3 for q ∈ {1, ..., Q} do

4 for k ∈ {1, ..., K} do

5 Bayesian Estimation Part

6 for j ∈ {1, ..., J} do

7 Generate φ

(k,q)

∼ p

(φ

(k,q)

);

8 Calculate

(k,q)

by (5) using φ

(k,q)

j=1

(k,q)

;

10 return {

[i] =

(k,Q)

}

k=1

2) WBP: Bayesian learning of WBP obtains a distribution

(φ

WBP

) by the minimization of the free energy loss (13)

using the pilots set C

pilot

. Once trained, the message bits are

decoded via

(m[v]|ℓ[v]) =E

WBP

∼p

(φ

WBP

)



WBP

(m[v]|ℓ[v])



(16)

for each data bit of index v in the information bits. The

resulting inference procedure is described in Algorithm 4. As

noted for DeepSIC, the resulting formulation only considers

the soft estimate representation taken at the ﬁnal iteration of

index Q when approaching the stochastic expectation. Doing

so fails to leverage the modular and interpretable operation of

WBP to boost calibration of its internal message.

C. Modular Bayesian Learning for Deep Receivers

The conventional application of Bayesian deep learning is

concerned with calibrating the ﬁnal output of a DNN. As

such, it does not exploit the modularity of model-based data-

driven architectures, and particularly the ability to relate their

internal features into probability measures. Here, we extend

Bayesian deep learning to leverage this architecture of modular

deep receivers. This is achieved by calibrating the intermediate

blocks via Bayesian learning, combined with a dedicated

sequential Bayesian learning procedure, as detailed next for

DeepSIC and WBP.

1) DeepSIC: During training, we optimize the distribution

(φ

(k,q)

) over the parameter vector φ

(k,q)

of each block

of index (k, q), i.e., φ

(k,q)

∼ p

(φ

(k,q)

), individually. These

distributions are obtained by minimizing the free energy loss

on a per-module basis as

(φ

(k,q)

) = arg min

′



(k,q)

∼p

′

(φ

(k,q)

)

MSIC

(φ

(k,q)

)]

KL(p

′

(φ

(k,q)

)||p

prior

(φ

(k,q)

))



, (17)

with the module-wise loss L

MSIC

(φ

(k,q)

) deﬁned in (7).

The core difference in the formulation of the training

objective in (17) as compared to (15) is that a single module

Algorithm 6: Modular Bayesian WBP

Input: LLR values ℓ; Distributions per module

(φ

(q )

)}

q=1

; Ensemble size J.

Output: Soft estimations for message bits

1 Modular Bayesian Inference

2 Initialize M

(q )

v →h

= M

(q )

h→v

= 0.

3 for q ∈ {1, ..., Q} do

4 for (v, h) ∈ H do

5 Bayesian Estimation Part

6 for j ∈ {1, ..., J} do

7 Generate φ

(q )

∼ p

(φ

(q )

);

8 Compute M

(q )

v →h,j

by (8) using φ

(q )

;

9 M

(q )

v →h

j=1

(q )

v →h,j

10 for (v, h) ∈ H do

11 Update M

(q )

h→v

by (9).

12 Calculate

L by (10) with q = Q.

13 return

is considered, rather than the entire model. Accordingly, the

learning is done in a sequential fashion, where we ﬁrst use (17)

to train the modules corresponding to iteration q = 1, followed

by training the modules for q = 2, and so on. This sequential

Bayesian learning utilizes the calibrated soft estimates com-

puted by using the data set at the qth iteration as inputs when

training the modules at iteration q +1. Doing so does not only

improves the calibration of the intermediate modules, but also

encourages them to learn with better calibrated soft estimates

of the interfering symbols.

Once the modules corresponding to the qth iteration are

trained, the soft estimate

(k,q)

[i] for the kth symbol at the

qth iteration is computed by taking into account the uncertainty

of the corresponding model parameter vector φ

(k,q)

. The

resulting expression is given by

(k,q)

[i] =

(k,q)

∼p

(φ

(k,q)

)

(k,q)

(s|y[i], {

(l,q−1)

[i]}

l̸=k

)

s∈S

where again the stochastic expectation is approximated via

ensembling.

Having computed a possibly different distribution for the

weights of each intermediate module, prediction of the in-

formation symbols is computed by setting

[i] =

(k,Q)

[i]

for each data symbol at i ∈ T

info

. The resulting inference

algorithm for the trained modular Bayesian DeepSIC is sum-

marized in Algorithm 5.

2) Weighted Belief Propagation: Each internal messages

of BP represents the belief on the value of a variable or

check node. Modelling each message exchange iteration via a

Bayesian deep architecture enables more reliable information

exchange of such belief values. By doing so, one can prevent

undesired quick convergence of the beliefs to polarized values

from overconﬁdent estimates, thus enhancing the reliability of

the ﬁnal estimate.

Carrying out WBP via modular Bayesian deep learning

requires learning a per-module basis distribution. Speciﬁcally,

the per-module distribution is obtained by minimizing the free

energy loss per module, i.e., each module q solves

(φ

(q )

) = arg min

′



(q)

∼p

′

(φ

(q)

)

MWBP

(φ

(q )

)]

KL(p

′

(φ

(q )

)||p

prior

(φ

(q )

))



where the module-wise loss L

MWBP

(φ

(q )

) is deﬁned in (12).

As detailed for modular Bayesian DeepSIC, training is carried

out via sequential Bayesian learning, gradually learning the

distribution of the parameters of each iteration, while using

its calibrated outputs when training the parameters of the

subsequent iteration.

In particular, each intermediate probability value is cal-

culated based on realizations of parameters drawn from the

obtained distribution, i.e.,

(q )

[v] =

(q)

∼p

(φ

(q)

)

(q)

(m[v]|ℓ[v], {M

(q −1)

′

→v

}

where ensembling is used to approximate the expected value.

The proposed inference procedure for modular Bayesian WBP

is described in Algorithm 6.

D. Discussion

The core novelty of the algorithms formulated above lies

in the introduction of the concept of modular Bayesian deep

learning, speciﬁcally designed for hybrid model-based data-

driven receivers, such as DeepSIC and WBP. Unlike conven-

tional Bayesian approaches (see Algorithms 3 and 4) that focus

solely on the output, our method applies Bayesian learning to

each intermediate module, leading to increased reliability and

performance at every stage of the network. Algorithms 5 and

6 illustrate the Bayesian estimation in intermediate layers.

Overall, this iterative gain in reliability propagates through-

out the model, resulting in higher performance and reliability

to classiﬁcation tasks, such as detection and decoding. The

consistent beneﬁts of modular Bayesian learning are demon-

strated through simulation studies in Section IV.

Our training algorithm relies on a sequential operation

which gradually learns the distribution of each iteration-based

module in a deep unfolded architecture, while using the

calibrated predictions of preceding iterations. This learning

approach has three core gains: (i) it facilitates individually

assessing each module based on its probabilistic prediction;

(ii) it enables the trainable parameters of subsequent iterations

to adapt to already calibrated predictions of their preceding

modules; and (iii) it facilitates an efﬁcient usage of the

available limited data obtained from pilots, as it reuses the

entire data set for training each module. Speciﬁcally, when

training each module individually, the data set is used to adapt

a smaller number of parameters as compared to jointly training

the entire architecture as in conventional end-to-end training.

Nonetheless, this form of learning is expected to be more

lengthy as compared to conventional end-to-end frequentist

learning, due to its sequential operation combined with the

excessive computations of Bayesian learning (13). While one

can possibly reduce training latency via, e.g., optimizing the

learning algorithm [19], such extensions of our framework are

left for future research.

IV. NUMERICAL EVALUATIONS

In this section we numerically quantify the beneﬁts of the

proposed modular Bayesian deep learning framework as com-

pared to both frequentist learning and conventional Bayesian

learning. We focus on uplink MIMO communications with

DeepSIC detection and WBP decoding. We note that the

relative performance of DeepSIC and WBP with frequentist

learning in comparison to state-of-the-art receiver methods was

studied in [21] and [31], respectively. Accordingly, here we

focus on evaluating the performance gains of a receiver archi-

tecture based on DeepSIC and WBP when modular Bayesian

learning is adopted, using as benchmarks frequentist learning

and standard, non-modular, Bayesian learning. Speciﬁcally,

after formulating the experimental setting and methods in Sub-

section IV-A, we study the effect of modular Bayesian learning

for equalization and detection separately in Subsections IV-B-

IV-C, respectively. Then, we asses the calibration of modular

Bayesian equalization and its contribution to downstream soft

decoding task in Subsection IV-D, after the full modular

Bayesian receiver is evaluated in Subsection IV-E.

A. Architecture and Learning Methods

Architectures: The DNN-aided equalizer used in this study

is based on the DeepSIC architecture [21], with DNN-modules

comprised of three fully connected (FC) layers with a hidden

layer size of 48 and ReLU activations unrolled over Q = 3

iterations. The decoder chosen is based on WBP [32], which is

implemented over Q = 5 iterations with learnable parameters

corresponding to messages of the variable-to-check layers (8).

Choosing Q = 5 layers was empirically observed to reach

satisfactory high performance gain over the vanilla BP [31],

[32].

Learning Methods: For both detection and decoding, we

compare the proposed modular Bayesian learning to both

frequentist learning and Bayesian learning. Speciﬁcally, we

employ the following three learning schemes:

• Frequentist (F): This method, labelled as “F” in the

ﬁgures, applies the conventional learning for training

DeepSIC and WBP as detailed in Subsection II-B.

• Bayesian (B): The conventional Bayesian receiver, re-

ferred to as “B”, carries out optimization of the distri-

bution p

(φ) in an end-to-end fashion as described in

Subsection III-B.

• Modular Bayesian (MB): The proposed modular

Bayesian learning scheme, introduced in

Subsection III-C, that applies Bayesian learning

separately to each module, and is labelled as “MB” in

the ﬁgures.

For the Bayesian learning schemes “B” and “MB”, we

adopt the Monte Carlo (MC) dropout scheme [36], [37], which

parameterizes the distribution p

(φ) in terms of nominal

(a) QPSK constellation (b) 8-PSK constellation

Fig. 3: Static linear synthetic channel: SER of DeepSIC as a function of the SNR for different training methods. At each

block, training is done with 384 pilots, and the SER is computed on 14, 976 information symbols from the QPSK or 8-PSK

constellation. Results are averaged over 10 blocks per point.

model parameter vector and dropout probability vector. During

Bayesian training, we set the inverse temperature parameter

β = 10

; and during Bayesian inference we set the ensemble

size as J = 5 for DeepSIC and as J = 3 for WBP. For all

of DeepSIC variations, optimization is done via Adam with

500 iterations and learning rate of 5 · 10

−3

. For WBP, Adam

with 500 iterations and learning rate of 10

−3

was used. These

values were set empirically to ensure convergence

B. Detection Performance

We begin by evaluating the proposed modular Bayesian

deep learning framework for symbol detection using the Deep-

SIC equalizer. We consider two types of different channels:

(i) a static synthetic linear channel, and (ii) a time-varying

channel obtained from the physically compliant COST simu-

lator [38], both with additive Gaussian noise. The purpose of

this study is to illustrate the beneﬁts of employing the modular

Bayesian approach to improve the detection performance in

terms of symbol error rate (SER).

1) Static Linear Synthetic Channels: The considered input-

output relationship of the memoryless Gaussian MIMO chan-

nel is given by

y[i] = Hs[i] + w[i], (18)

where H is a ﬁxed N × K channel matrix, and w [i] is a

complex white Gaussian noise vector. The matrix H models

spatial exponential received power decay, and its entries are

given by (H)

n,k

= e

−|n−k|

, for each n ∈ {1, . . . , N } and

k ∈ {1, . . . , K}. The pilots and the transmitted symbols

are generated in a uniform i.i.d. manner over two different

constellations, the ﬁrst being the quadrature phase shift keying

(QPSK) constellation, i.e., S = {(±

√

, ±

√

)}, and the

second being 8-ary phase shift keying (8-PSK) constellation,

i.e., S = {e

jπℓ

|ℓ ∈ {0, . . . , 7}}. For both cases, the number

of users and antennas are set as K = N = 4.

We evaluate the accuracy of hard detection in terms of

SER, i.e., as the average

E[ (

[i] ̸= s

[i])], with

The source code used in our experiments is available at

https://github.com/tomerraviv95/bayesian-learning-in-receivers-for-decoding

(·) being the indicator function, and where the expectation

is estimated using the transmitted symbols and the noise in

(18). Speciﬁcally, the block is composed of T

pilot

= 384 pilot

symbols, and T

info

= 14, 976 information symbols, for a total

number of symbols T

tran

= 15, 360 in a single block for both

constellations. The pilots amount to only 2.5% of the total

block size.

The resulting SER performance versus signal-noise ratio

(SNR) are reported in Fig. 3. The gains of the modular

Bayesian DeepSIC (“MB”) are apparent. Speciﬁcally, “MB”

consistently outperforms the frequentist DeepSIC (“F”) by up

to 0.7 dB under QPSK in medium-to-high SNRs, and by up

to 1.5 dB when the 8-PSK constellation is considered. The

explanation for this performance gap lies in the severe data-

deﬁcit conditions. Due to limited data, the frequentist and

the black-box Bayesian (“B”) methods are severely overﬁtted,

particularly in the 8-PSK case. In particular, conventional

Bayesian learning succeeds in calibrating the output of the

receiver, but not of the intermediate blocks, increasing the

reliability at the expense of the accuracy.

It is also noted that the gain of the proposed “MB” approach

as compared to “F” and “B” becomes more pronounced with

an increasing SNR level. To understand this behavior, it is

useful to decompose the uncertainty that affects data-driven

solutions into aleatoric uncertainty and epistemic uncertainty.

Aleatoric uncertainty is the inherent uncertainty present in

data, e.g., due to channel noise; while epistemic uncertainty

is the uncertainty that arises from lack of training data. Since

both the frequentist and Bayesian learning suffer equally from

aleatoric uncertainty, the gain of Bayesian learning becomes

highlighted with reduced aleatoric uncertainty, i.e., with an

increased SNR (see, e.g., [13, Sec. 4.5.2] for a detailed

discussion).

2) Time-Varying Linear COST Channels: Next, we evaluate

the proposed model-based Bayesian scheme under the time-

varying COST 2100 geometry-based stochastic channel model

[38]. The simulated setting represents multiple users moving in

an indoor setup while switching between different microcells.

Succeeding on this scenario requires high adaptivity, since

(a) QPSK constellation (b) 8-PSK constellation

Fig. 4: Time-varying COST channel: SER of DeepSIC as a function of the SNR for different training methods. At each

block, training is done with 384 pilots, and the SER is computed on 14, 976 information symbols from the QPSK or 8-PSK

constellation. Results are averaged over 10 blocks per point.

there is considerable variability in the channels observed in

different blocks. The employed channel model has the channel

coefﬁcients change in a block-wise manner. The instantaneous

channel matrix is simulated using the wide-band indoor hall

5 GHz setting of COST 2100 with single-antenna elements,

which creates 4 × 4 = 16 narrow-band channels when setting

K = N = 4.

For the same hyperparameters as in Subsection IV-B1, in

Fig. 4, we report the SER as a function of the SNR for both

QPSK and 8-PSK after the transmission of 10 blocks. The

main conclusion highlighted above, in the synthetic channel, is

conﬁrmed in this more realistic setting. It is systematically ob-

served that modular Bayesian learning yields notable SER im-

provements, particularly for higher-order constellations, where

overﬁtting is more pronounced.

C. Decoding Performance

We proceed to evaluating the contribution of modular

Bayesian learning to channel decoding. To that end, prior

to the symbols’ transmission, the message is encoded via

polar code (128, 64) (as in the 5G standard). The WBP

decoder is either trained using frequentist (“F”), Bayesian

(“B”), or modular Bayesian (“MB”) methods, and is applied

following soft equalization using DeepSIC. We henceforth use

the notation “X/Y” with X,Y ∈ {F, B, MB} to indicate that

the soft equalizer whose outputs are fed to WBP for decoding

is DeepSIC trained using method “X”, while WBP is trained

using “Y”. As our focus in this subsection is only on the

decoder, we consider frequentist detection only. In all cases,

we trained DeepSIC, and then trained the WBP decoder on the

soft outputs obtained by the corresponding DeepSIC detector

according to (11). This way, the decoder is optimized to make

use of the detector’s soft outputs.

To evaluate modular Bayesian learning of channel decoders,

we consider a binary phase shift keying (BPSK) constellation.

We use 1, 536 pilot symbols, while the rest of the hyperparam-

eters chosen as in Subsection IV-B1. For the considered setup,

Fig 5 depicts the decoding bit error rate (BER) versus SNR.

Fig. 5: Decoding performance: BER as a function of SNR for

either a frequentist, a Bayesian or a modular Bayesian WBP

decoder. DeepSIC is trained by the frequentist method.

Large gains to decoding are observed by employing the mod-

ular Bayesian method. This stands in contrast to the black-box

Bayesian approach, which may degrade performance at high

SNR values. This behavior is in line with a similar observation

regarding DeepSIC trained in an end-to-end frequentist manner

with a low number of training samples, as reported in [21,

Fig. 10].

D. Calibration and Its Effect on Decoding Gain

As brieﬂy mentioned in Subsection III-D, applying modular

Bayesian learning in equalization can also beneﬁt decoding

performance for the two following reasons: (i) increased

accuracy for hard detection, i.e., lower SER (as reported in

Subsection IV-B); and (ii) increased calibration performance

for soft detection which results in more reliable LLRs provided

to the decoders. To evaluate the individual contribution of each

of these factors, we next compute the expected calibration

error (ECE) [39], which is conventionally used as a scalar

measure of calibration performance (see also [12] for other

(a) SER vs SNR (b) ECE vs SNR (c) BER vs SNR

Fig. 6: Analysis of decoding gain: SER, ECE, and BER as a function of SNR for frequentist and modular Bayesian detector

followed by a frequentist decoder.

(a) SER vs DeepSIC Iterations (Q) (b) ECE vs DeepSIC Iterations (Q) (c) BER vs DeepSIC Iterations (Q)

Fig. 7: Analysis of decoding gain: SER, ECE, and BER as a function of the number of iterations in DeepSIC for frequentist,

Bayesian, and modular Bayesian detectors followed by a frequentist decoder.

calibration measures), and evaluate its contribution to soft

decoding.

To formulate the ECE measure, let us partition the interval

[0, 1] of values for the conﬁdence level, i.e., for the ﬁnal soft

output

(obtained from either frequentist learning, Bayesian

learning, or modular Bayesian learning), into R intervals

{(0,

], ..., (

R−1

, 1]}. Deﬁne the rth bin as the collection

of information-symbol indices that have the corresponding

conﬁdence values lying in the rth interval, i.e.,



i ∈ T

info

, ∀k : P (

[i]|y

[i]) ∈



r−1



. (19)

Accordingly, the within-bin accuracy and within-bin conﬁ-

dence are respectively computed for each mth bin as

Acc(T

) =

i∈T

,∀k

(

[i] = s

[i]), (20)

Conf(T

) =

i∈T

,∀k

P (

[i]|y

[i])). (21)

For a perfectly calibrated model as per (4), the within-

bin accuracy Acc(T

) must equal the within-bin conﬁdence

Conf(T

) for every bin r = 1, ..., R given a sufﬁciently large

set of information symbols T

info

. The ECE computes the

average absolute difference between the within-bin accuracy

and within-bin conﬁdence, weighted by the number of samples

that lie in each range, i.e.,

ECE =

r=1

′



|Acc(T

) − Conf(T



. (22)

The smaller the ECE is, the better calibrated the detector is. In

the following analysis, we set the number of conﬁdence bins

to R = 10

In Fig. 6, we study the impact of the learning method for

the detector by plotting the SER, ECE, and BER for frequen-

tist and modular Bayesian DeepSIC. Here, we implemented

DeepSIC with a single iteration Q = 1, followed by a standard

frequentist decoder. Note that in this case, modular Bayesian

DeepSIC and Bayesian DeepSIC coincide. The advantage of

Bayesian learning for the detector is seen to be exclusively

in terms of ECE, since the SER is similar for frequentist

and modular Bayesian DeepSIC across the considered range

of SNRs. The improved calibration is observed to yield a

decrease in the BER as the decoding module can beneﬁt from

the more reliable soft outputs of the Bayesian detector, as can

be clearly observed in Figs. 6b-6c.

Next, we investigate how reliable information exchange

between the modules within the detector affects the hard-

decision performance. Towards this goal, in Fig. 7, we vary

the modular structure of the DeepSIC detector by changing the

number of iterations Q. We set the SNR to 15 dB, and measure

the SER, ECE, and BER for Q ∈ {1, 2, 3}. With a higher

number of iterations Q > 1, modular Bayesian DeepSIC is

better able to exploit the modular structure, achieving both

a lower SER and a lower ECE as compared to frequentist

and Bayesian DeepSIC. Its improved reliability and SER are

jointly translated at the output of the decoder into a lower

BER, as demonstrated in Fig. 7.

E. Overall Receiver Evaluation

We conclude our numerical evaluation by assessing the con-

tribution of adopting modular Bayesian both in detection and

decoding. To that end, we compare the “F/F”, “F/MB”,“MB/F”

(a) SER vs SNR (b) ECE vs SNR (c) BER vs SNR

Fig. 8: Overall evaluation for the synthetic channel: SER, ECE and BER versus SNR for the combination of either a frequentist

or modular Bayesian DeepSIC followed by either a frequentist or modular Bayesian WBP decoder.

(a) SER vs SNR (b) ECE vs SNR (c) BER vs SNR

Fig. 9: Overall evaluation for the COST channel: SER, ECE and BER versus SNR for the combination of either a frequentist

or modular Bayesian DeepSIC followed by either a frequentist or modular Bayesian WBP decoder.

and “MB/MB” methods for 8-PSK, with either the static

synthetic channel from Subsection IV-B1 or the time-varying

COST from Subsection IV-B2 using the same hyperparame-

ters.

The performance achieved by the overall DNN-aided re-

ceiver for different training schemes, when applied in the

channel of Subsection IV-B1, is reported in Fig. 8. In par-

ticular, Fig. 8a shows that the gain achievable by adopting our

scheme is up to 1.5dB in SER. Due to this SER gain and the

improved ECE as shown in Fig. 8b, the “MB/F” in Fig. 8c

is shown to have similar gain in BER. Moreover, adopting

modular Bayesian in both detection and decoding (“MB/MB”)

is observed to have consistent gains that add up to 0.25dB.

Fig. 9 reports the results for the COST channel model.

We note that Fig 9 further reinforces the superiority of our

scheme, proving the gains of the “MB/MB” method over the

“F/F” are also apparent over a more practical channel. Under

this channel, the gains of adopting modular Bayesian decoding

are higher, while the gains of modular Bayesian detection are

slightly smaller, with both contributing up to 0.8dB each.

V. CONCLUSION

We have introduced modular Bayesian learning, a novel

integration of Bayesian learning with hybrid model-based

data-driven architectures, for wireless receiver design. The

proposed approach harnesses the beneﬁts of Bayesian learning

in producing well-calibrated soft outputs for each internal

module, hence resulting in performance improvement via

reliable information exchange across the internal modules. We

have shown that applying modular Bayesian learning to the

system with a cascade of modules, in which each module is

composed of multiple internal modules, can yield a signiﬁcant

performance gain, e.g., BER gains of up to 1.6 dB.

Our proposed modular Bayesian learning, integrated with

model-based deep learning, is designed for deep receivers.

However, its underlying design principles are expected to be

useful for a broad range of tasks where modular deep archi-

tectures are employed. These include, e.g., modular DNNs

designed for dynamic systems [40], [41] and array signal

processing tasks [42]. Moreover, our deviation focuses on im-

proving reliability via Bayesian learning. However, Bayesian

learning may not improve the reliability of the modules

whenever its assumption has been made wrong, e.g., prior

misspeciﬁcation and/or likelihood misspeciﬁcation [43]–[45].

This gives rise to a potential extension that replace Bayesian

learning with robust Bayesian learning [29], possibly com-

bined with conformal prediction [46]–[48]. These extensions

are left for future work.

REFERENCES

[1] T. Raviv, S. Park, O. Simeone, and N. Shlezinger, “Modular model-

based Bayesian learning for uncertainty-aware and reliable deep MIMO

receivers,” in Proc. IEEE ICC, 2023.

[2] D. G

und

uz, P. de Kerret, N. D. Sidiropoulos, D. Gesbert, C. R. Murthy,

and M. van der Schaar, “Machine learning in the air,” IEEE J. Sel. Areas

Commun., vol. 37, no. 10, pp. 2184–2199, 2019.

[3] O. Simeone, “A very brief introduction to machine learning with ap-

plications to communication systems,” IEEE Trans. on Cogn. Commun.

Netw., vol. 4, no. 4, pp. 648–664, 2018.

[4] A. Balatsoukas-Stimming and C. Studer, “Deep unfolding for commu-

nications systems: A survey and some new directions,” arXiv preprint

arXiv:1906.05774, 2019.

[5] N. Farsad, N. Shlezinger, A. J. Goldsmith, and Y. C. Eldar, “Data-driven

symbol detection via model-based machine learning,” arXiv preprint

arXiv:2002.07806, 2020.

[6] Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wireless

networks: A comprehensive survey,” IEEE Commun. Surveys Tuts.,

vol. 20, no. 4, pp. 2595–2621, 2018.

[7] L. Dai, R. Jiao, F. Adachi, H. V. Poor, and L. Hanzo, “Deep learning

for wireless communications: An emerging interdisciplinary paradigm,”

IEEE Wireless Commun., vol. 27, no. 4, pp. 133–139, 2020.

[8] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:

Applications, trends, technologies, and open research problems,” IEEE

Network, vol. 34, no. 3, pp. 134–142, 2019.

[9] T. Raviv, S. Park, O. Simeone, Y. C. Eldar, and N. Shlezinger, “Adaptive

and ﬂexible model-based AI for deep receivers in dynamic channels,”

IEEE Wireless Commun., early access, 2023.

[10] L. Chen, S. T. Jose, I. Nikoloska, S. Park, T. Chen, and O. Simeone,

“Learning with limited samples: Meta-learning and applications to com-

munication systems,” Foundations and Trends® in Signal Processing,

vol. 17, no. 2, pp. 79–208, 2023.

[11] S. T. Jose and O. Simeone, “Free energy minimization: A uniﬁed

framework for modelling, inference, learning, and optimization,” IEEE

Signal Process. Mag., vol. 38, no. 2, pp. 120–125, 2021.

[12] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration

of modern neural networks,” in International Conference on Machine

Learning. PMLR, 2017, pp. 1321–1330.

[13] O. Simeone, Machine learning for engineers. Cambridge University

Press, 2022.

[14] N. Shlezinger, J. Whang, Y. C. Eldar, and A. G. Dimakis, “Model-based

deep learning,” Proc. IEEE, vol. 111, no. 5, pp. 465–499, 2023.

[15] N. Shlezinger, Y. C. Eldar, and S. P. Boyd, “Model-based deep learning:

On the intersection of deep learning and optimization,” IEEE Access,

vol. 10, pp. 115 384–115 398, 2022.

[16] N. Shlezinger and Y. C. Eldar, “Model-based deep learning,” Founda-

tions and Trends® in Signal Processing, vol. 17, no. 4, pp. 291–416,

2023.

[17] V. Monga, Y. Li, and Y. C. Eldar, “Algorithm unrolling: Interpretable,

efﬁcient deep learning for signal and image processing,” IEEE Signal

Process. Mag., vol. 38, no. 2, pp. 18–44, 2021.

[18] N. Shlezinger and T. Routtenberg, “Discriminative and generative learn-

ing for the linear estimation of random signals [lecture notes],” IEEE

Signal Process. Mag., vol. 40, no. 6, pp. 75–82, 2023.

[19] T. Raviv, S. Park, O. Simeone, Y. C. Eldar, and N. Shlezinger, “Online

meta-learning for hybrid model-based deep receivers,” IEEE Trans.

Wireless Commun., vol. 22, no. 10, pp. 6415–6431, 2023.

[20] N. Shlezinger, N. Farsad, Y. C. Eldar, and A. J. Goldsmith, “Data-driven

factor graphs for deep symbol detection,” in Proc. IEEE ISIT, 2020, pp.

2682–2687.

[21] N. Shlezinger, R. Fu, and Y. C. Eldar, “DeepSIC: Deep soft interference

cancellation for multiuser MIMO detection,” IEEE Trans. Wireless

Commun., vol. 20, no. 2, pp. 1349–1362, 2021.

[22] N. Shlezinger, N. Farsad, Y. C. Eldar, and A. Goldsmith, “Viterbinet:

A deep learning based Viterbi algorithm for symbol detection,” IEEE

Trans. Wireless Commun., vol. 19, no. 5, pp. 3319–3331, 2020.

[23] T. Raviv, N. Raviv, and Y. Be’ery, “Data-driven ensembles for deep and

hard-decision hybrid decoding,” in Proc. IEEE ISIT, 2020, pp. 321–326.

[24] T. Van Luong, N. Shlezinger, C. Xu, T. M. Hoang, Y. C. Eldar, and

L. Hanzo, “Deep learning based successive interference cancellation

for the non-orthogonal downlink,” IEEE Trans. Veh. Technol., vol. 71,

no. 11, pp. 11 876–11 888, 2022.

[25] P. Jiang, T. Wang, B. Han, X. Gao, J. Zhang, C.-K. Wen, S. Jin, and G. Y.

Li, “AI-aided online adaptive OFDM receiver: Design and experimental

results,” IEEE Trans. Wireless Commun., vol. 20, no. 11, pp. 7655–7668,

2021.

[26] L. V. Jospin, H. Laga, F. Boussaid, W. Buntine, and M. Bennamoun,

“Hands-on Bayesian neural networks—a tutorial for deep learning

users,” IEEE Comput. Intell. Mag., vol. 17, no. 2, pp. 29–48, 2022.

[27] D. T. Chang, “Bayesian neural networks: Essentials,” arXiv preprint

arXiv:2106.13594, 2021.

[28] H. Wang and D.-Y. Yeung, “A survey on Bayesian deep learning,” ACM

Computing Surveys (CSUR), vol. 53, no. 5, pp. 1–37, 2020.

[29] M. Zecchin, S. Park, O. Simeone, M. Kountouris, and D. Gesbert,

“Robust Bayesian learning for reliable wireless AI: Framework and

applications,” IEEE Trans. on Cogn. Commun. Netw., vol. 9, no. 4, pp.

897–912, 2023.

[30] K. M. Cohen, S. Park, O. Simeone, and S. Shamai, “Bayesian active

meta-learning for reliable and efﬁcient AI-based demodulation,” IEEE

Trans. Signal Process., vol. 70, pp. 5366–5380, 2022.

[31] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and

Y. Be’ery, “Deep learning methods for improved decoding of linear

codes,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 119–131,

2018.

[32] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode linear

codes using deep learning,” in Annual Allerton Conference on Commu-

nication, Control, and Computing (Allerton), 2016.

[33] W.-J. Choi, K.-W. Cheong, and J. M. Ciofﬁ, “Iterative soft interference

cancellation for multiple antenna systems,” in Proc. IEEE WCNC, 2000.

[34] J. Pearl, Probabilistic reasoning in intelligent systems: networks of

plausible inference. Elsevier, 2014.

[35] M. Tomlinson, C. J. Tjhai, M. A. Ambroze, M. Ahmed, and M. Jibril,

Error-Correction Coding and Decoding: Bounds, Codes, Decoders,

Analysis and Applications. Springer Nature, 2017.

[36] Y. Gal and Z. Ghahramani, “Dropout as a Bayesian approximation:

Representing model uncertainty in deep learning,” in International

Conference on Machine Learning. PMLR, 2016, pp. 1050–1059.

[37] S. Boluki, R. Ardywibowo, S. Z. Dadaneh, M. Zhou, and X. Qian,

“Learnable Bernoulli dropout for Bayesian deep learning,” in Interna-

tional Conference on Artiﬁcial Intelligence and Statistics. PMLR, 2020,

pp. 3905–3916.

[38] L. Liu, C. Oestges, J. Poutanen, K. Haneda, P. Vainikainen, F. Quitin,

F. Tufvesson, and P. De Doncker, “The COST 2100 MIMO channel

model,” IEEE Wireless Commun., vol. 19, no. 6, pp. 92–99, 2012.

[39] M. P. Naeini, G. Cooper, and M. Hauskrecht, “Obtaining well calibrated

probabilities using bayesian binning,” in Proceedings of the AAAI

Conference on Artiﬁcial Intelligence, vol. 29, no. 1, 2015.

[40] P. Becker, H. Pandya, G. Gebhardt, C. Zhao, C. J. Taylor, and

G. Neumann, “Recurrent Kalman networks: Factorized inference in

high-dimensional deep feature spaces,” in International Conference on

Machine Learning, 2019, pp. 544–552.

[41] G. Revach, N. Shlezinger, X. Ni, A. L. Escoriza, R. J. Van Sloun, and

Y. C. Eldar, “KalmanNet: Neural network aided Kalman ﬁltering for

partially known dynamics,” IEEE Trans. Signal Process., vol. 70, pp.

1532–1547, 2022.

[42] D. H. Shmuel, J. P. Merkofer, G. Revach, R. J. van Sloun, and

N. Shlezinger, “SubspaceNet: Deep learning-aided subspace methods

for DoA estimation,” arXiv preprint arXiv:2306.02271, 2023.

[43] J. Knoblauch, J. Jewson, and T. Damoulas, “Generalized variational

inference: Three arguments for deriving new posteriors,” arXiv preprint

arXiv:1904.02063, 2019.

[44] A. Masegosa, “Learning under model misspeciﬁcation: Applications to

variational and ensemble methods,” Advances in Neural Information

Processing Systems, vol. 33, pp. 5479–5491, 2020.

[45] W. R. Morningstar, A. Alemi, and J. V. Dillon, “Pacm-bayes: Narrowing

the empirical risk gap in the misspeciﬁed bayesian regime,” in Interna-

tional Conference on Artiﬁcial Intelligence and Statistics. PMLR, 2022,

pp. 8270–8298.

[46] V. Vovk, A. Gammerman, and G. Shafer, Algorithmic learning in a

random world. Springer, 2005, vol. 29.

[47] A. N. Angelopoulos and S. Bates, “A gentle introduction to confor-

mal prediction and distribution-free uncertainty quantiﬁcation,” arXiv

preprint arXiv:2107.07511, 2021.

[48] K. M. Cohen, S. Park, O. Simeone, and S. Shamai, “Calibrating AI

models for wireless communications via conformal prediction,” IEEE

Trans. Mach. Learn. Commun. Netw., vol. 1, pp. 296–312, 2023.