Adaptive and Flexible Model-Based AI for Deep Receivers in

Adaptive and Flexible Model-Based AI for Deep

Receivers in Dynamic Channels

Tomer Raviv, Sangwoo Park, Osvaldo Simeone, Yonina C. Eldar, and Nir Shlezinger

Abstract—Artiﬁcial intelligence (AI) is envisioned to play

a key role in future wireless technologies, with deep neural

networks (DNNs) enabling digital receivers to learn to operate in

challenging communication scenarios. However, wireless receiver

design poses unique challenges that fundamentally differ from

those encountered in traditional deep learning domains. The

main challenges arise from the limited power and computational

resources of wireless devices, as well as from the dynamic nature

of wireless communications, which causes continual changes

to the data distribution. These challenges impair conventional

AI based on highly-parameterized DNNs, motivating the devel-

opment of adaptive, ﬂexible, and light-weight AI for wireless

communications, which is the focus of this article. Here, we

propose that AI-based design of wireless receivers requires

rethinking of the three main pillars of AI: architecture, data, and

training algorithms. In terms of architecture, we review how to

design compact DNNs via model-based deep learning. Then, we

discuss how to acquire training data for deep receivers without

compromising spectral efﬁciency. Finally, we review efﬁcient,

reliable, and robust training algorithms via meta-learning and

generalized Bayesian learning. Numerical results are presented

to demonstrate the complementary effectiveness of each of the

surveyed methods. We conclude by presenting opportunities for

future research on the development of practical deep receivers.

I. INTRODUCTION

Wireless communication technologies are subject to esca-

lating demands for connectivity and throughput, with rapid

growth in media transmissions, including images, videos, and,

in the near future, augmented and virtual reality. Furthermore,

transformative applications such as the Interent of Things

(IOT), autonomous driving, and smart manufacturing are ex-

pected to play major roles in the new 5G-deﬁned deployments

of ultra-reliable and low-latency communication (URLLC) and

massive machine-type communications (mMTC) services. To

accommodate these scenarios, communication systems must

meet strict performance requirements in reliability, latency, and

complexity [1].

To facilitate meeting these performance requirements,

emerging technologies such as mmWave and THz commu-

nication, holographic multiple-input multiple-output (MIMO),

This project has received funding from the Israeli 5G-WIN consortium,

the European Union’s Horizon 2020 research and innovation program un-

der grants No. 646804-ERC-COG-BNYQ, as well as 725731, and by the

European Union’s Horizon Europe project CENTRIC (101096379). Sup-

port is also acknowledged for the Israel Science Foundation under grant

No. 0100101, and for an Open Fellowship of the EPSRC with refer-

ence EP/W024101/1. T. Raviv and N. Shlezinger are with the School

of ECE, Ben-Gurion University of the Negev, Beer-Sheva, Israel (e-mail:

tomerravi[email protected]; [email protected]). S. Park and O. Simeone are

with the Department of Engineering, King’s College London, U.K. (email:

{sangwoo.park; osvaldo.simeone}@kcl.ac.uk). Y. C. Eldar is with the Faculty

of Math and CS, Weizmann Institute of Science, Rehovot, Israel (e-mail:

[email protected]).

spectrum sharing, and intelligent reconﬁgurable surfaces

(IRSs) are currently being investigated. While these tech-

nologies may support desired performance levels, they also

introduce substantial design and operating complexity [1]. For

instance, holographic MIMO hardware is likely to introduce

non-linearities on transmission and reception; the presence of

IRSs complicates channel estimation; and classical communi-

cation models may no longer apply in novel settings such as

the mmWave and THz spectrum, due to violations of far-ﬁeld

assumptions and lossy propagation. This paper addresses the

latter source of complexity by focusing on efﬁcient design of

receiver processing.

Traditional receiver processing design is model-based, re-

lying on simpliﬁed channel models, which, as mentioned,

may no longer be adequate to meet the requirements of next-

generation wireless systems. The rise of deep learning as an

enabler technology for artiﬁcial intelligence (AI) has revo-

lutionized various disciplines, ranging from computer vision

and natural language processing (NLP) to speech reﬁnement

and biomedical signal processing. The ability of deep neural

networks (DNNs) to learn mappings from data has spurred

growing interest in their usage for receiver design in digital

communications [2], [3]. DNN-aided receivers, referred to

henceforth as deep receivers, have the ability to succeed where

classical algorithms may fail. Speciﬁcally, deep receivers

can learn a detection function in scenarios having no well-

established physics-based mathematical model, a situation

known as model-deﬁcit; or in settings for which the model

is too complex to give rise to tractable and efﬁcient model-

based algorithms, a situation known as algorithm-deﬁcit.

Consequently, deep receivers have the potential to meet the

constantly growing requirements of wireless systems.

Despite their promise, several core challenges arise from

the fundamental differences between wireless communications

and traditional AI domains such as computer vision and

NLP, limiting the widespread applicability of deep learning in

wireless communications. The ﬁrst challenge is attributed to

the nature of the devices employed in communication systems.

Wireless communication receivers are highly constrained in

terms of their computational ability, battery consumption, and

memory resources. However, deep learning inherently relies

on highly parameterized architectures, assuming the avail-

ability of powerful devices, e.g., high-performance computing

servers.

A second challenge stems from the nature of the wire-

less communication domain. Communication channels are

dynamic, implying that the receiver task, dictated by the

data distribution, changes over time. This makes the standard

arXiv:2305.07309v1 [cs.IT] 12 May 2023

Pillar Method Literature

Architecture

Deep unfolding [4]–[6]

DNN-aided inference [7]

Data

Self-supervised training [7]–[10]

Data augmentation [11], [12]

Training

Algorithm

Meta-learning [13]–[16]

Bayesian learning [17], [18]

Modular training [13]

Fig. 1: A summary of methods surveyed in this article that adapt the three pillars of AI to the requirements of deep wireless

receivers.

pipeline of data collection, annotation, and training highly

challenging. Speciﬁcally, DNNs rely on (typically labelled)

data sets to learn from the underlying unknown, but stationary,

data distributions. For example, machine translation tasks,

requiring the mapping of an origin language into a destination

language, do not change over time, enabling the collection of

a large volume of training data and the deployment of a pre-

trained, static, DNN. This is not the case for wireless receivers,

whose processing task depends on the time-varying channel,

restricting the size of the training data set representing the

task.

The two challenges outlined above imply that successfully

applying AI for wireless receiver design requires deviating

from conventional deep learning approaches. To this end, there

is a need to develop communication-oriented AI techniques,

which are the focus of this article. Previous tutorials on AI for

communications, e.g., [2], [3], have primarily concentrated on

surveying diverse challenges and applications of conventional

deep learning methods in the context of communication. In

contrast, the present article aims to review approaches that

address the unique challenges in the design of deep receivers

that arise from the mentioned limitations of wireless devices

and from the dynamic nature of the communication domain.

Our main objective is to provide a systematic review of

research directions that target the practical deployment of deep

receivers.

We commence by motivating the development of AI systems

that are light-weight, and thus operable on power and hardware

limited devices, as well as adaptive and ﬂexible, enabling

online on-device adaptation. As illustrated in Fig. 1, we

then propose that AI-based wireless receiver design requires

revisiting the three main pillars of AI, namely (i) the archi-

tecture of AI models; (ii) the data used to train AI models;

and (iii) the training algorithm that optimizes the AI model

for generalization, i.e., to maximize performance outside the

training set (either on the same distribution or for a completely

new one).

For each of these AI pillars, we survey candidate approaches

for facilitating the operation of the deep receivers. (i) We ﬁrst

discuss how to design light-weight trainable architectures via

model-based deep learning [19]. This methodology hinges on

the principled incorporation of model-based processing, ob-

tained from domain knowledge on optimized communication

algorithms, within AI architectures. (ii) Next, we investigate

how labelled data can be obtained without impairing spectral

efﬁciency, i.e., without increasing the pilot overhead. To this

end, we show how receivers can generate labelled data by self-

supervision aided by existing communication algorithms; and

how they can further enrich data sets via data augmentation

techniques that utilize invariance properties of communication

systems. (iii) Finally, we cover training algorithms for deep

receivers that are designed to meet requirements in terms of

efﬁciency, reliability, and robust adaptation of wireless com-

munication systems, avoiding overﬁtting from limited training

data while limiting training time. These methods include

communication-speciﬁc meta-learning as well as generalized

Bayesian learning and modular learning.

To illustrate the individual and complementary gains of the

reviewed approaches, we provide a numerical study consider-

ing ﬁnite-memory single-input single-output (SISO) channels

as well as multi-user MIMO systems. We conclude by dis-

cussing the road ahead, as well as key research challenges

that are yet to be addressed to enable adaptive and ﬂexible

light-weight deep receivers.

II. DEEP RECEIVERS IN DYNAMIC CHANNELS

As discussed in the previous section, harnessing the

potential of deep learning in wireless systems requires

communication-speciﬁc AI schemes that are adaptive, ﬂexible,

and light-weight. The light-weight requirement follows from

the power and computational constraints of wireless devices;

while the need for adaptivity and ﬂexibility is entailed by the

dynamic nature of wireless channels. Classical model-based

receiver processing is inherently adaptive and ﬂexible: The

receiver periodically estimates the channel using the available

pilots, and then uses this estimate to adapt the operation of

the receiver baseband chain, which is a direct function of

the channel coefﬁcients. In contrast, for deep receivers, the

dependence of the weights of the DNN on the channel state

is indirect, and hence designing ﬂexible, channel-adaptive,

DNNs-based processing is a non-trivial task.

Current state of the art on deep receivers encompasses the

following three main approaches to address channel variations.

A1 Joint Learning: The most straightforward approach

amounts to optimizing a single DNN model to maximize

performance on average over a broad range of channel

Fig. 2: Overall illustration of online training of deep receivers in time-varying channels.

conditions. Methods in this class train a DNN using data

corresponding to an extensive set of expected channel

realizations, aiming to learn a mapping that is tailored

to the distribution of the channel [20]. Accordingly,

joint learning may be thought of as seeking the optimal

non-coherent receiver, which is agnostic to the current

channel realization. As a result, performance degradation

as compared to a coherent receiver is generally to be

expected.

A2 Channel as Input: An alternative approach uses an instan-

taneous estimate of the channel as an additional input

to the DNN [21]. Among the main drawbacks of this

approach are the limited ﬂexibility in accommodating

different system dimensions, e.g., number of antennas or

number of users, and the lack of structure in the way

different inputs, such as received signals and channel state

information, are handled.

A3 Online Training: As illustrated in Fig. 2, in online train-

ing, decoded data from prior blocks is used, alongside

new pilots, to adapt the deep receiver to channel varia-

tions. This class of approaches inherits the limitations of

continual learning, such as catastrophic forgetting, and is

generally not suitable to ensure fast adaptation.

The mentioned shortcomings of the three existing ap-

proaches reviewed above motivate a fundamental rethinking of

the application of machine learning tools to wireless receivers

along the three directions illustrated in Fig. 1.

• The architecture of the DNN should be carefully selected

on the basis of domain knowledge so as to reduce data re-

quirements, while also ensuring efﬁcient implementation

of the model. This amounts to improvements in terms of

the inductive bias on which learning is based.

• The data used for learning should be augmented, when

possible, by leveraging the inherent redundancies of en-

coded signals.

• The training algorithm should make use of historical

data while also preparing for quick adaptation to changing

channel conditions.

In the following sections we review candidate approaches for

each of these aspects, as summarized in Fig. 1.

III. ARCHITECTURE

The standard neural architectures employed in AI systems

for communication are based on highly-parameterized, un-

structured, deep neural models such as feed-forward neural

networks. The over-parameterization has been found to be

advantageous in a host of other tasks, such as NLP. However,

since deep receivers should adapt to time-varying conditions

using limited training data, this type of architectures is typi-

cally undesirable. In this section, we introduce ways to design

tailored model architectures by leveraging domain knowledge

with the goal of improving adaptivity and data efﬁciency. In

Sec. V, we will also study data-driven approaches for the

optimization of the inductive bias – also known as meta-

learning – and see how they can be combined with model-

driven architectures introduced in this section to further reduce

the generalization gap.

In model-based deep learning, DNN architectures are de-

signed that are inspired by model-based algorithms tailored

to the particular problem of interest [19]. In the context

of deep receivers, the dominant model-based deep learning

methodologies are deep unfolding and DNN-aided inference,

which are illustrated in Fig. 3 and discussed next.

Many model-based algorithms used by wireless receivers

rely on iterative optimizers that operate by gradually improv-

ing an optimization variable based on an objective function.

Deep unfolding converts an iterative optimizer into a discrim-

inative AI model by introducing trainable parameters within

each of a ﬁxed number of iterations [19]. Training a deep

unfolding architecture can thus adapt an iterative optimizer on

the basis of available data for a given problem of interest. As

Fig. 3: Illustration of model-based, data-driven, and model-based deep learning framework for deep receivers.

we detail next, the aim is addressing model and/or algorithmic

deﬁciencies of the original algorithm.

Speciﬁcally, deep unfolding enhances iterative optimizers in

the following ways (see [19] for further details).

• Learned Hyperparameters: Iterative optimizers often in-

cludes hyperparameters, such as step-sizes, damping fac-

tors, and regularization coefﬁcients, that are typically

tuned by hand by the designer and shared among all

iterations. Deep unfolding can treat such hyperparameters

as trainable parameters. This is useful to cope with forms

of algorithm deﬁciency, whereby an iterative algorithm

requires too many iterations or struggles to converge to a

suitable decision. For example, the work [4] showed that

unfolding the orthogonal approximate message passing

algorithm for MIMO detection, and learning iteration-

dependent scaling coefﬁcients, notably improves perfor-

mance, requiring only a few iterations.

• Learned Objective: Deep unfolding can also enhance an

iterative algorithm by tuning the objectives function ap-

proximately optimized at each iteration. This optimization

addresses algorithm deﬁciencies, in a manner similar to

the optimization of hyperparameters; as well as model

deﬁciencies by adapting the design criterion to observed

data, rather than to assumptions about the model. A

representative example is the MMNet architecture pro-

posed in [5] for unfolding MIMO detection. MMNet,

which is based on proximal gradient steps, parameterizes

the gradient computation procedure at each iteration,

effectively using an iteration-dependent design objective.

• DNN Conversion: The third form of deep unfolding incor-

porates a full DNN module within each iteration in order

to implement some functionality of the solver in the most

ﬂexible manner. DNN conversion is suitable for handling

model deﬁciency, since the DNN modules can learn how

to best realize model-independent internal computations

at each iteration. For instance, DeepSIC proposed in [6]

is derived from the iterative soft interference cancellation

(SIC) MIMO detection algorithm with the introduction of

DNN models for implementing each stage of interference

cancellation and soft detection in a manner agnostic to the

underlying channel model.

DNN-Aided inference refers to a family of model-based

deep learning methods that incorporate DNNs into model-

based methods that do not implement iterative processing. A

representative example is the ViterbiNet equalizer proposed

in [7]. Viterbi equalization is applicable to any ﬁnite-memory

channel, as long as one can compute the conditional distri-

bution of channel output given the corresponding input, also

known as likelihood. Based on this observation, ViterbiNet

implements the Viterbi algorithm while using a DNN to

compute the likelihood. In this way, ViterbiNet addresses

model deﬁciencies by operating in a channel-model-agnostic

manner and requiring only the conventional ﬁnite-memory

modelling assumption to hold.

IV. DATA

The amount of data obtained from pilots is typically insuf-

ﬁcient to train an AI model for a deep receiver. This motivates

the introduction of strategies that expand the available labelled

training data set without requiring the transmission of more

pilots. As we detail in this section, existing techniques apply

either self-supervised learning or data augmentation.

With self-supervised learning, training data is extended

using the redundancy of transmitted signals either at the

symbol level or at the codeword level. In contrast, in data

augmentation, the goal is to enrich the given labelled data set

by leveraging invariance properties of the data. As summarized

in Fig. 4, these approaches can be potentially combined, and

integrated with a number of different architectures (Sec. III)

and training algorithms (Sec. V).

Codeword-level self-supervision exploits the presence of

channel coding to generate labelled data from channel outputs.

It uses error correction codes to correct detection errors, and

then utilizes the corrected data as labelled data for training, as

long as the codewords are decoded successfully [7]–[9].

Fig. 4: Data acquisition pipeline for deep receivers without

harming spectral efﬁciency.

Symbol-level self-supervision obtains labelled data from

information symbols without relying on channel decoding.

This is useful since some symbols can be correctly detected

even the decoding on the overall codeword fails. Symbol-level

self-supervision hence requires reliable soft detection measures

to indicate the degree to which each information symbol may

be considered to be correctly received [10].

Data augmentation is an established framework in conven-

tional AI domains to enrich training sets by leveraging known

invariances in the data. For instance, for image classiﬁcation,

one can use a single image to generate multiple images

with the same label by rotating or clipping it. While data

augmentation is quite common in AI, it is highly geared to-

wards image and language data. Data augmentations for digital

communications have been explored in [11], and more recently

in [12]. The techniques studied in [12] include leveraging

the symmetry in digital constellations to project error pat-

terns between different symbols; exploiting the independence

between the noise and the transmitted symbols to generate

additional noisy realizations; and accounting for forms of

invariance to constellation-preserving rotations exhibited by

wireless channels.

V. TRAINING

The training algorithm addresses the optimization of a data-

dependent loss function, with the goal of identifying models

with satisfactory generalization performance. The performance

of a training algorithm depends, in practice, on (i) the choice of

the loss function; (ii) the optimization algorithm; and (iii) the

relevance and quality of the data used to evaluate the train-

ing loss. In this section, we review communication-oriented

approaches for designing adaptive, data-efﬁcient, training al-

gorithm for deep receivers based on (i) meta-learning, (ii)

generalized Bayesian learning, and (iii) modular learning.

Meta-learning is a general framework that seeks to obtain a

data-efﬁcient training procedure that is applicable for multiple

tasks of interest [15]. A training procedure that is data-, or

sample-, efﬁcient is able to achieve a small generalization

gap, while using a small amount of training data. Meta-

learning and model-based learning (see Sec. III) are two

complementary approaches that reduce the generalization gap

under a ﬁxed amount of training data: The former is data-

driven and typically optimizes the training algorithm; while

the latter is model-driven and optimizes the architecture. While

meta-learning encompasses a variety of conceptually distinct

methods, the prominent approaches for application to deep

receivers are gradient-based meta-learning and hypernetwork-

based meta-learning.

• Gradient-based meta-learning: Gradient-based meta-

learning optimizes some of the hyperparameters of a

ﬁrst-order training algorithm. While in principle, one

could “meta-learn” any hyperparameter, such as the learn-

ing rate, optimizing the initial weights of the DNNs

has been found to be extremely beneﬁcial for boosting

adaptation and ﬂexibility of training procedures in many

applications, including in wireless communications [15].

DNN initialization is a form of inductive bias, since the

parametric function space of the DNN becomes restricted

by enforcing adherence to the initialization through a

limited number of gradient-based updates. Meta-learning

can be combined with a model-based inductive bias, as

demonstrated in [13].

• Hypernetwork-based meta-learning: Gradient-based

meta-learning requires running a number of (stochastic)

gradient updates. An alternative approach that does

not require in real-time any additional optimization for

adaptation to new tasks incorporates a hypernetwork in

the system, alongside the main DNN. The hypernetwork

takes as input the available data, or any other context

information, regarding the task of interest, and produces

at the output the weights of the main DNN. More

precisely, typically, only a subset of weights of the

main DNN are updated; and/or each output of the

hypernetwork affects simultaneously a group of weights,

e.g., in the same layer, of the main DNN. Hypernetwork-

based meta-learning has been applied successfully

in wireless communication systems, including for

beamforming and MIMO detection [14], [16].

Bayesian learning is the gold standard for training strate-

gies that aim at producing AI models offering a reliable as-

sessment of the uncertainty of their decisions. Such reliable AI

models must output conﬁdence measures that reﬂect the true

accuracy of their decisions. Bayesian learning boosts reliability

by treating the model parameters as random variables, and

by accordingly maintaining a distribution over the weights

of a DNN. This distribution is meant to capture epistemic

uncertainty in the presence of limited training data.

Bayesian learning involves particle-based, deterministic or

stochastic, procedures, or optimization over the parameters of

the distribution in the model parameter space. Such optimiza-

tion addresses a training criterion that includes an information-

theoretic regularizer enforcing closeness to a prior distribution.

For deep receivers, boosting the reliability of a DNN

model allows the latter to provide informative soft decision

to downstream DNN or model-based modules, e.g., for soft

decoding. This makes it possible for the different modules of

a deep receiver to “trust” the outputs of other modules [18].

Generalized forms of Bayesian learning allow for a ﬂexible

choice of the regularization function, as well as of the data-

ﬁtting part of the training objective. Such methods were shown

to be useful in wireless systems for their capacity to deal with

model misspeciﬁcation and outliers [17].

Modular learning exploits the interpretable structure of

hybrid model-based deep receivers to facilitate rapid learning

from limited data. As opposed to meta-learning and Bayesian

learning, modular learning is speciﬁc to model-based deep

learning architectures. It builds on the fact that, unlike black-

box DNNs, in model-based deep learning architectures, one

can often assign a concrete functionality to different trainable

sub-modules of the architecture, and not just to its input and

output. Each functionality may then be adapted at different

rates and times, as some functionalities may require rapid

adaptation, while the others may be kept unchanged over a

longer time scale.

This approach was applied in [13] for online adaptation

of the DeepSIC MIMO receiver of [6]. There, the ability

to associate different users with sub-modules of the deep

receivers was leveraged to carry out the online training of

sub-modules associated with users that are identiﬁed as being

characterized by faster dynamics. The method was shown to

dramatically reduce the number of gradient-based updates and

the amount of data needed for online training.

VI. NUMERICAL RESULTS

In this section we showcase the impact of schemes designed

to facilitate light-weight, adaptive, and ﬂexible AI across the

three AI pillars highlighted throughout this article. We focus

on ﬁnite-memory SISO channels (with 4 taps) and memoryless

4 × 4 multi-user MIMO time-varying channels with binary

phase shift keying (BPSK) and quadrature phase shift keying

(QPSK) symbols, respectively

Architecture: In each channel, we consider a model-based

DNN architecture, as well as black-box DNN, having roughly

three times more parameters. For the SISO channel, with a

ﬁnite channel memory of L symbols, we compare ViterbiNet

[7] with a recurrent neural network (RNN)-based symbol

detector with a window size of L, followed by a linear

layer and the softmax function. For the MIMO channel, the

DeepSIC receiver [6] with three iterations is compared to a

fully connected DNN composed of four layers with ReLU

activations followed by the softmax layer.

Data: For each coherence duration, 200 pilot symbols are

available. We compare standard training with training that

leverages data augmentation. For the latter scheme, at each

time step, the pilot data is enriched with 600 artiﬁcial symbols

via a constellation-conserving projection and a translation-

preserving transformation [12].

Training: We consider the following training methodolo-

gies:

• Joint training: The receiver is trained ofﬂine, using 5000

symbols simulated from a multitude of channel realiza-

tions. No additional training is done at run time.

• Online training: The receiver is trained initially using 200

symbols, and then it adapts online by utilizing either the

pilot data or the augmented pilot data.

The source code used in our experiments is available at

https://github.com/tomerraviv95/facilitating-adaptation-deep-receivers

• Online meta-learning: The training algorithm is opti-

mized via meta-learning that utilizes accumulated training

data from previous channel realizations, while adaptation

takes place via few gradient-based updates from the

online meta-learned initialization [13].

Results: Fig. 5a and Fig. 5b depict the average symbol error

rate (SER) as a function of signal-to-noise ratio (SNR). While

standard black-box models suffer from large generalization

gaps due to the limited availability of training data, deep

receivers with model-based architectures, namely ViterbiNet

[7] and DeepSIC [6], demonstrate successful detection perfor-

mance by adapting to the time-varying channel in an online

manner.

The performance is seen to be further improved by opti-

mizing the training algorithm via meta-learning, as well as by

increasing the data size via data augmentation. Overall, these

results indicate that the reviewed methods are complementary,

contributing to the challenges of adapting to time-varying

channels in different ways. This leads to the conclusion that

designing AI models for communications can beneﬁt from a

rethinking of deep learning tools across all three AI pillars.

VII. FUTURE RESEARCH DIRECTIONS

We conclude by identifying some representative directions

for future research.

A. Deciding When to Train

The schemes surveyed thus far are geared towards enabling

efﬁcient online on-device training. In this regard, a key open

question is how to determine when to train online. Periodically

re-training, e.g., at each coherence period, may be excessively

complex, particularly when channel variations are relatively

smooth. Efﬁcient deep receiver would beneﬁt from monitoring

mechanisms that can determine when to adapt the model

and/or when to meta-learn the inductive bias. One possible

way to achieve this goal is via data drift detection, a topic

widely studied in the machine learning literature [22]. While

some basic drift detection mechanisms can be directly applied

to communication systems, advanced mechanisms that lever-

age communication-speciﬁc characteristics may require further

development.

B. Fitting the Architecture to the Scenario

Deep receivers are often composed of multiple layers,

wherein each element takes part in the computation. Thus,

even for relatively light-weight architectures, full model com-

putations may incur computational overhead exceeding the

limited resources available, particularly for some edge devices.

This problem is typically tackled via pruning methods, which

aim to bridge the complexity-performance gap by removing

redundant parts of the models. While most existing pruning

methods ﬁnd a single, input-independent, light-weight model,

for wireless communication systems it may be preferable to

adopt input-dependent, adaptive pruning methods [23], that

can adapt complexity to the current requirements.

(a) SISO - SER as SNR. (b) MIMO - SER vs. SNR.

Fig. 5: Average SER after transmission of 300 blocks in a time-varying channel as a function of SNR.

C. Hardware-Aware AI

The schemes surveyed in this paper do not make use of any

special characteristics of the hardware available at the host

device, focusing instead of generic improvements based on

limiting the architecture parameterization and/or the number

of training iterations. Larger efﬁciency gains are to be expected

with AI methods that are aware of the speciﬁc hardware at the

wireless receiver, which may encompass different technologies

such as emerging in-memory computing chips.

D. Continual Bayesian Learning

Bayesian learning was introduced in this article as a promis-

ing solution for deep receivers thanks to the potential gains

that are enabled by the deployment of more reliable AI

modules. Another important advantage of Bayesian learning

is its capacity to support continual learning via the update

of the model parameter distribution [24]. The integration of

online adaptation with Bayesian learning may further enhance

the performance of deep receivers.

E. Collaborative Learning and Inference

As mentioned, deep receivers are practically constrained

by the hardware available at the host device. Since deep

receivers are likely to be deployed in environments containing

other, similar, devices, this limitation may be mitigated via

resource sharing across devices. Such collaboration may entail

the exchange of data and/or model information, and it may

be supported by device-to-device communication capabilities.

This idea is deeply connected to federated learning [25] and

collaborative inference [26].

REFERENCES

[1] “6G - the next hyper connected experience for all,” Samsung 6G Vision,

2020.

[2] L. Dai, R. Jiao, F. Adachi, H. V. Poor, and L. Hanzo, “Deep learning

for wireless communications: An emerging interdisciplinary paradigm,”

IEEE Wireless Commun., vol. 27, no. 4, pp. 133–139, 2020.

[3] W. Tong and G. Y. Li, “Nine challenges in artiﬁcial intelligence and

wireless communications for 6G,” IEEE Wireless Commun., vol. 29,

no. 4, pp. 140–145, 2022.

[4] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Model-driven deep learning for

MIMO detection,” IEEE Trans. Signal Process., vol. 68, pp. 1702–1715,

2020.

[5] M. Khani, M. Alizadeh, J. Hoydis, and P. Fleming, “Adaptive neural

signal detection for massive MIMO,” IEEE Trans. Wireless Commun.,

vol. 19, no. 8, pp. 5635–5648, 2020.

[6] N. Shlezinger, R. Fu, and Y. C. Eldar, “DeepSIC: Deep soft interference

cancellation for multiuser MIMO detection,” IEEE Trans. Wireless

Commun., vol. 20, no. 2, pp. 1349–1362, 2021.

[7] N. Shlezinger, N. Farsad, Y. C. Eldar, and A. J. Goldsmith, “ViterbiNet:

A deep learning based Viterbi algorithm for symbol detection,” IEEE

Trans. Wireless Commun., vol. 19, no. 5, pp. 3319–3331, 2020.

[8] M. B. Fischer, S. D

orner, S. Cammerer, T. Shimizu, H. Lu, and

S. Ten Brink, “Adaptive neural network-based OFDM receivers,” in

Proc. IEEE SPAWC, 2022.

[9] S. Schibisch, S. Cammerer, S. D

orner, J. Hoydis, and S. ten Brink,

“Online label recovery for deep learning-based communication through

error correcting codes,” in Proc. IEEE ISWCS, 2018.

[10] R. A. Finish, Y. Cohen, T. Raviv, and N. Shlezinger, “Symbol-level

online channel tracking for deep receivers,” in Proc. IEEE ICASSP,

2022, pp. 8897–8901.

[11] L. Huang, W. Pan, Y. Zhang, L. Qian, N. Gao, and Y. Wu, “Data

augmentation for deep learning-based radio modulation classiﬁcation,”

IEEE Access, vol. 8, pp. 1498–1506, 2019.

[12] T. Raviv and N. Shlezinger, “Data augmentation for deep receivers,”

IEEE Trans. Wireless Commun., 2023, early access.

[13] T. Raviv, S. Park, O. Simeone, Y. C. Eldar, and N. Shlezinger, “Online

meta-learning for hybrid model-based deep receivers,” IEEE Trans.

Wireless Commun., 2023, early access.

[14] M. Goutay, F. A. Aoudia, and J. Hoydis, “Deep hypernetwork-based

MIMO detection,” in Proc. IEEE SPAWC, 2020.

[15] L. Chen, S. T. Jose, I. Nikoloska, S. Park, T. Chen, and O. Simeone,

“Learning with limited samples: Meta-learning and applications to com-

munication systems,” Foundations and Trends® in Signal Processing,

vol. 17, no. 2, pp. 79–208, 2023.

[16] Y. Liu and O. Simeone, “Learning how to transfer from uplink to

downlink via hyper-recurrent neural network for FDD massive MIMO,”

IEEE Trans. Wireless Commun., vol. 21, no. 10, pp. 7975–7989, 2022.

[17] M. Zecchin, S. Park, O. Simeone, M. Kountouris, and D. Gesbert,

“Robust Bayesian learning for reliable wireless AI: Framework and

applications,” IEEE Trans. on Cogn. Commun. Netw., 2023, early access.

[18] T. Raviv, S. Park, O. Simeone, and N. Shlezinger, “Modular model-

based Bayesian learning for uncertainty-aware and reliable deep MIMO

receivers,” in Proc. IEEE ICC, 2023.

[19] N. Shlezinger, Y. C. Eldar, and S. P. Boyd, “Model-based deep learning:

On the intersection of deep learning and optimization,” IEEE Access,

vol. 10, pp. 115 384–115 398, 2022.

[20] T. O’Shea and J. Hoydis, “An introduction to deep learning for the

physical layer,” IEEE Trans. on Cogn. Commun. Netw., vol. 3, no. 4,

pp. 563–575, 2017.

[21] M. Honkala, D. Korpi, and J. M. Huttunen, “DeepRx: Fully convolu-

tional deep learning receiver,” IEEE Trans. Wireless Commun., vol. 20,

no. 6, pp. 3925–3940, 2021.

[22] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under

concept drift: A review,” IEEE Transactions on Knowledge and Data

Engineering, vol. 31, no. 12, pp. 2346–2363, 2018.

[23] P. Singh, V. K. Verma, P. Rai, and V. P. Namboodiri, “Play and prune:

Adaptive ﬁlter pruning for deep model compression,” arXiv preprint

arXiv:1905.04446, 2019.

[24] P. G. Chang, K. P. Murphy, and M. Jones, “On diagonal approximations

to the extended Kalman ﬁlter for online training of Bayesian neural

networks,” in Continual Lifelong Learning Workshop at ACML 2022,

2022.

[25] T. Gafni, N. Shlezinger, K. Cohen, Y. C. Eldar, and H. V. Poor,

“Federated learning: A signal processing perspective,” IEEE Signal

Process. Mag., vol. 39, no. 3, pp. 14–41, 2022.

[26] N. Shlezinger and I. V. Baji

c, “Collaborative inference for AI-

empowered IoT devices,” IEEE Internet of Things Magazine, vol. 5,

no. 4, pp. 92–98, 2022.

Tomer Raviv (tomerravi[email protected]) is currently pursuing his Ph.D

degree in electrical engineering in Ben-Gurion University.

Sangwoo Park ([email protected]) is currently a research associate at

the Department of Engineering, King’s Communications, Learning and Infor-

mation Processing (KCLIP) lab, King’s College London, United Kingdom.

Osvaldo Simeone (osv[email protected]) is a Professor of Information

Engineering with the Centre for Telecommunications Research at the Depart-

ment of Engineering of King’s College London, where he directs the King’s

Communications, Learning and Information Processing lab.

Yonina C. Eldar ([email protected]) is a Professor in the Depart-

ment of Math and Computer Science, Weizmann Institute of Science, Israel,

where she heads the center for Biomedical Engineering and Signal Processing.

She is a member of the Israel Academy of Sciences and Humanities, an IEEE

Fellow and a EURASIP Fellow.

Nir Shlezinger ([email protected]) is an Assistant Professor in the School of

Electrical and Computer Engineering in Ben-Gurion University, Israel.